Thousands of Clickworkers record voice commands used to control car infotainment systems. These are then transcribed and analyzed, providing the manufacturer with significant speech recognition training data needed to program and optimize the speech recognition software.
Voice control systems are only as good as their speech recognition. The biggest challenge is optimizing and training these speech recognition systems to react to the large variety of voice commands.Programming that does not include “human reason” and “human behavior” factors cannot lead to an ideal speech recognition system. In many cases, the users’ voice commands are not recognized, or they are misunderstood.
The users must often enter their commands several times before the system reacts to the entry correctly and displays the desired information. This is time-consuming for the user and distracting while driving.
Speech recordings of thousands of different people with their individual commands and pronunciations are needed to optimize the range of the system for it to be able to recognize the individual voice commands of potential users.
Thousands of our Clickworkers from different countries and regions record how they would issue a command, to call up the predefined reaction x, or information y, via the infotainment system. Every voice recording differs – even in the same language – due to the individual choice of words, the word order as well as every single Clickworker’s specific pronunciation.
To optimize the speech recognition software algorithms, they must also be trained to react to certain cues such as keywords. In a second step, our Clickworkers transcribe all the voice recordings and analyze these sentences to identify the keywords used and their frequency.
With the help of these recordings, manufacturers train their speech recognition software and optimize the infotainment system to respond to the individually different ways users handle the system.
Clickworker qualifications: Native speakers from the target regions
Languages: 9 languages
Number of voice recordings (in MP4-Format): 810,000 (600 recordings per language for 150 scenarios)
Tasks: 1. Task: Create the audio recording 2. Task: Transcribe the recordings 3. Task: Analyze and evaluate the recordings
Quality assurance: a second Clickworker, the transcriber, checks the quality of the recordings
Data transfer: Data transfer via xls-file
Speech recognition offers many useful applications that can make day-to-day activities easier. Whether it is used to search for something online, unlock a smartphone, or operate a car infotainment system: More and more programs use voice recordings. This poses challenges to the software development. Since every person speaks differently based on their dialect, individual mannerisms, or potential speech impediments, the program needs to be trained to recognize the same words in various iterations. This is why the human factor plays such an important role in gathering speech recognition training data. Simply using one recording to train the system would not yield the desired results. Instead, we provide a multitude of different voice recordings that can help the machine learn. Once this foundation has been laid, the software can use the training data to come to the right conclusions and keep evolving.