An increasing number of intelligent systems ranging from smartphones, info entertainment systems for vehicles, tablet and smartphone applications, to household devices and building services technology, are controlled via voice input.
However, many of the voice control systems are very error-prone. The human factor has often been disregarded during programming. Human beings do not always apply the same logic; they express themselves differently according to their language skills, nationality, social environment and educational background. As soon as the command entered does not comply with the envisaged and programmed command of the system due to the selection of words, sequence of words or pronunciation, the user is not understood and the command is not carried out. Break-offs and renewed speech input are time-consuming for the user and in some situations distracting and dangerous, for example while driving.
To optimally adjust voice control systems to the behavior and pronunciation of the users, behavior patterns of many different people have to be determined and taken into consideration. How do different individuals proceed when they operate the systems? Which commands do they enter via speech recognition to call up specific information and which words do they select to do so, and in which order? How are the individual words pronounced? Crowdsourcing by clickworker provides an ideal and efficient data collection tool to obtain valid data quickly.
Detailed information about our service “Audio data sets for speech recognition training“
Ask our international and 1.5 m strong crowd of Clickworkers how they proceed when they operate their system and which speech commands they would give to call up specific information. By using our crowdsourcing services you will receive all the data needed from potential users of your system in valid quantities whilst taking into account and allocating according to nationalities and regional language differences. Furthermore, the results can be classified into other demographic data of our Clickworkers, for example age group or gender.
This newsletter presents a case study that demonstrates how crowdsourcing can be used to train speech recognition systems to react to human behavior and make them more intelligent.
Thousands of Clickworkers record their speech input to control a car infotainment system and supply the manufacturer with these important data for the programming and optimization of the system.
Programming without the “human understanding” and “human behavior” factors cannot yield an optimal speech recognition system. Often, the users’ speech entries are not recognized, or they are misunderstood. The users must often enter their commands several times before the system reacts to the entry correctly and displays the desired information. This is time-consuming for the user and is often distracting while driving.
German native speaker from
Germany, Switzerland and Austria.
Equipment needed by the Clickworker:
PC or laptop with a microphone and loudspeakers.
Audio files via Cloud.
Typical number of daily speech recordings:
500 – 600 recordings.
40,000 speech recordings (1,000 speech
recordings per task/target; 40 tasks/target).
For further questions about our services or offer requests, please send us an e-mail to: email@example.com
or give us a call at: +49 201 959718-0.
No related posts.
Dieser Artikel wurde am 04.August 2014 von Ines geschrieben.
Ines Maione is responsible for marketing and contact person for PR.