Ines Maione
Optimization of speech recognition systems
An increasing number of intelligent systems ranging from smartphones, info entertainment systems for vehicles, tablet and smartphone applications, to household devices and building services technology, are controlled via voice input.
However, many of the voice control systems are very error-prone. The human factor has often been disregarded during programming. Human beings do not always apply the same logic; they express themselves differently according to their language skills, nationality, social environment and educational background. As soon as the command entered does not comply with the envisaged and programmed command of the system due to the selection of words, sequence of words or pronunciation, the user is not understood and the command is not carried out. Break-offs and renewed speech input are time-consuming for the user and in some situations distracting and dangerous, for example while driving.
To optimally adjust voice control systems to the behavior and pronunciation of the users, behavior patterns of many different people have to be determined and taken into consideration. How do different individuals proceed when they operate the systems? Which commands do they enter via speech recognition to call up specific information and which words do they select to do so, and in which order? How are the individual words pronounced? Crowdsourcing by clickworker provides an ideal and efficient data collection tool to obtain valid data quickly.
Detailed information about our service “Audio data sets for speech recognition training“
Ask our international and 1.5 m strong crowd of Clickworkers how they proceed when they operate their system and which speech commands they would give to call up specific information. By using our crowdsourcing services you will receive all the data needed from potential users of your system in valid quantities whilst taking into account and allocating according to nationalities and regional language differences. Furthermore, the results can be classified into other demographic data of our Clickworkers, for example age group or gender.
This newsletter presents a case study that demonstrates how crowdsourcing can be used to train speech recognition systems to react to human behavior and make them more intelligent.
Optimization of speech recognition systems with crowdsourcing
Thousands of Clickworkers record their speech input to control a car infotainment system and supply the manufacturer with these important data for the programming and optimization of the system.
Challenge | Exemplary workflow | |
Voice control systems are often only as good as their speech recognition. The challenge in these speech recognition systems is to optimize and train them to react to the different ways speech is entered by the users. Programming without the “human understanding” and “human behavior” factors cannot yield an optimal speech recognition system. Often, the users’ speech entries are not recognized, or they are misunderstood. The users must often enter their commands several times before the system reacts to the entry correctly and displays the desired information. This is time-consuming for the user and is often distracting while driving. In order to optimize the system’s range and enable it to recognize the individual speech entry options of potential users, speech recordings from thousands of different people with individual commands and pronunciations are needed. |
| |
Solution | ||
Thousands of our Clickworkers in the German-speaking world record how they would issue a command, to call up the predefined reaction “x” or information “y” via the infotainment system. Every speech recording differs through the selection of words, sequence of words as well as the pronunciation of the individual Clickworker. The recordings help to train the speech recognition of the system and to optimize the infotainment system for the individually different ways users handle the system. | ||
Project specifications | ||
Clickworker qualifications: Equipment needed by the Clickworker: Data transfer: | Typical number of daily speech recordings: Quality assurance: Project volume: |
For further questions about our services or offer requests, please send us an e-mail to: request@clickworker.com
or give us a call at: +49 201 959718-0.