Development (history) and applications of speech recognition systems

Development speech recognition

Walking into many houses around the world, you’re likely to find one or perhaps more ubiquitous little speakers, scattered around. For the residents of these homes, these devices have become a key part of their lives, sharing details about meetings, travel plans, grocery lists and even weather reports. We’ve come to depend on them to help simplify our lives and entertain us.

However, smart speakers weren’t the first introduction of speech recognition. With Apple and Siri, speech recognition took a huge leap forward in the early 2010’s, helping us get to where we are now. However, this technology didn’t happen overnight. Rather, it is something that’s been in progress for decades.

Understanding Speech Recognition

Language and speech go back thousands of years. Conversely, computing is a much more recent development. Speech recognition software, or speech recognition technology, is an attempt to marry these two together so that computing devices can understand and react to human voice.

Speech recognition is, however, incredibly complex. When a child is educated, they learn by watching and listening to all of the different noises around them. Over time, they associate different sounds with words and specific items. Their brain builds unique patterns that they carry with them throughout their lives, helping decipher accents, inflection and tone to provide meaning.

Training a computer is similar in some ways, but also very different. Humans seem to learn languages effortlessly, but that is primarily because we’ve learned how best to teach children. We do not yet have that same insight with computers. What we do know is that educating machines requires data and lots of research.

While we’ve made significant strides in improving the accuracy rate of speech recognition systems, there is still work to be done to better help computers understand different dialects and languages. Today, speech recognition works fairly well for most common tasks. In fact, companies like Google and IBM state that their algorithms for speech are close to 96% accurate – but getting to this point took time and effort.


Just ask clickworker to get thousands of voice recordings in the languages and dialects you need to train your speech recognition system to perfection.

Get more information about the service audio datasets

The History of Speech Recognition systems

The first true speech recognition system was developed in the early 1950s. This system, called “Audrey”, was developed by Bell Laboratories and focused on understanding numbers. In the following decade, IBM came out with a system that could respond to 16 words, called “Shoebox”.

The 1970s saw a huge leap forward in speech recognition, primarily with the help of the US government and DARPA. Thanks to the work done by DARPA, Carnegie Mellon created a system called “Harpy” that was able to understand 1000 words. To put this in perspective: This is about the same as a three-year-old child would understand.

The 1980s and 90s continued to see gradual improvements to speech recognition, but it wasn’t until the 2000s that another true transformation took place. In the early 2000s, speech recognition was 80% accurate, but with the launch of Google Voice and its cloud data centers, this accuracy quickly started to increase.

Google could correlate the voice searches with the actual search results to learn from and better predict what users were looking for. With the launch of Siri in 2011, Apple joined the race to improve speech recognition, also helping to lead us to where we are now with a close to 96% accuracy rate.

How Speech Recognition is Used

When comparing voice versus typing speed, the winner is evident and clear. Humans can speak an average of 150 words per minute, compared to typing 40 words per minute. The obvious question is, why are we not all talking to our computers instead of typing?

The truth is that right now, the restriction is still its reliability. While speech recognition claims a 96% accuracy rate, that only true under very specific conditions. When multiple languages and accents or dialects are in play, accuracy drops very quickly.

For speech to become ubiquitous as a means of input, devices need to be able to understand and resolve all of the conflicts inherent in our language. Issues like homophones, for example, where words are pronounced the same but mean something different, can cause humans to second-guess what is being said. Computers have the same problem.

Despite these limitations, speech recognition is making inroads in many different areas. While it initially launched on smartphones, it is now available on smart speakers, computers, cars and even smartwatches. And the future of speech recognition with its many different use cases is even more intriguing.

Speech in the workplace

Speech recognition is also impacting the corporate world with business applications suitable to almost every department. Historically, a personal assistant has been something only the most successful executive could afford. Today, almost anyone can access an AI digital assistant that helps keep them organized and on time.

In addition to simplifying work, speech can also help improve efficiency by improving the speed at which users work by simply dictating text that will generate perfect documents. Office security, too, can be improved by using speech recognition paired with biometric information instead of card swipes.

Real time translation services

There are many companies already developing real-time translation hardware that utilizes speech recognition. These voice translators transcribe language in multiple languages and send a message with a detailed translation.

While this technology can help improve communication for many around the world, it is restricted to the speech patterns and languages currently available. These limitations are temporary and will still be removed with additional research already being conducted.

Speech in customer service

Customers hate to wait for service. This is something that banks have realized and with speech, they now have the capability to remove some limitations. The Royal Bank of Canada (RBC), for instance, lets customers pay their bills and even transfer money simply using their voice and Siri.

Other banks let users check account balances, hear about payment dates and even make payments using Alexa. This technology is in the early stages of development but showing great promise in removing roadblocks.

Speech is not just about banking though. Shopping can be a pain also, especially for those items that are hard to find. Amazon and Alexa can help with this problem simply by using your voice to add essential items based on your past history to your cart.


We are at a critical tipping point in the evolution of speech recognition. Voice recognition is only going to get more sophisticated as time progresses, with more applications becoming mainstream. There are millions of different voice assistants already on the market – and that number is only going to continue to grow in the years to come.

Perhaps the best way to understand what the future holds when it comes to speech recognition is a simple quote from Brian Roemmele:

“The last 60 years of computing, humans were adapting to the computer. The next 60 years, the computer will adapt to us. It will be our voices that will lead the way; it will be a revolution and it will change everything.”/p>