The challenge for speech recognition training data
Voice control systems are only as good as their speech recognition. The biggest challenge is optimizing and training these speech recognition systems to react to the large variety of voice commands.
Programming that does not include “human reason” and “human behavior” factors cannot lead to an ideal speech recognition system. In many cases, the users’ voice commands are not recognized, or they are misunderstood.
The users must often enter their commands several times before the system reacts to the entry correctly and displays the desired information. This is time-consuming for the user and distracting while driving.
Speech recordings of thousands of different people with their individual commands and pronunciations are needed to optimize the range of the system for it to be able to recognize the individual voice commands of potential users.
The solution: creating data sets to improve speech recognition software
Thousands of our Clickworkers from different countries and regions record how they would issue a command, to call up the predefined reaction x, or information y, via the infotainment system. Every voice recording differs – even in the same language – due to the individual choice of words, the word order as well as every single Clickworker’s specific pronunciation.
To optimize the speech recognition software algorithms, they must also be trained to react to certain cues such as keywords. In a second step, our Clickworkers transcribe all the voice recordings and analyze these sentences to identify the keywords used and their frequency.
With the help of these recordings, manufacturers train their speech recognition software and optimize the infotainment system to respond to the individually different ways users handle the system.