Data are the foundation for training algorithms. The more realistic the data, the better the results. This is because artificial intelligence is based on precise and reliable information for training its algorithms. This is obvious but it is often overlooked. The training data are realistic when they reflect the data that the AI system gathers in real operation. Unrealistic data sets prevent machine learning and lead to expensive false interpretations.
Artificial neural networks need to be fed good input to be able to learn – just like the human brain. Ultimately, it is the data that are used to train the systems that will determine what an AI system knows and can accomplish. When using artificially created and open data as training data you run a great risk of obtaining distorted results because the data are often not realistic. Artificial intelligence consists of algorithms that are fed data from which they are meant to learn – so-called machine learning. If data are used that are not realistic with regard to their use in the system, this can lead to insufficient or incorrect results in the system as illustrated in the following example.
While developing a software for drone cameras the developers make use of photographs found on the Internet. These photos exist in ample supply on Facebook or Instagram. However, these photos have two typical features:
A self-learning algorithm will draw incorrect conclusions from these features. These allegedly general structures are not useful for the assessment of camera photos taken from a drone; at worst they may even be harmful. In the exemplary case the algorithm might learn that important objects are always at the center of the image – a false conclusion. Photographs taken by drones are taken from various perspectives and distances.
Another example: To train automobile software for the German market, the developer team uses photos of traffic situations taken worldwide. In this case there is a risk that artificial neural networks in practice misinterpret an advertising poster that is similar to a foreign traffic sign, for a road sign.
How does one identify poor training data sets? The following signs can be indications, for instance:
The solution is to gather the data oneself or have them newly gathered by a provider. In doing so one can have them gathered to meet ones requirements and / or examine existing data sets with regard to whether they are suitable for the respective system. They are suitable when the data sets correspond to what input the system receives, recognizes and correctly evaluates when in operation.
At clickworker you can have your AI training data newly generated – to meet your individual requirements and tailored to the specifications of your system.
The quality of training data can be verified based on the following questions:
The crowd is especially successful for the generation as well as the verification of training data for systems with artificial intelligence. In principle there are three individual approaches here, but they can also be combined:
Inadequate data can also be optimized for use as training data at a later date. Within a short period of time, Clickworkers can process raw data – add keywords and tags, use bounding boxes, polygons and key points to annotate elements on images, or carry out semantic segmentations.
The data sets and results are subsequently controlled, either by means of various procedures, including peer review, or dual control principle and majority decision.
More information about the clickworker “AI training data” service.
The main risk involved in unrealistic data is that it can falsify an entire algorithm. This is similar to the human brain: If the basic assumption and information turn out to be incorrect, then the hypotheses and worldviews on which they are based are also incorrect. As a consequence, for the machine as well as the brain, this means that it has to start all over again. This can be very expensive in the case of the machine. No company can afford to use an unsafe technology. It is therefore advisable to pay attention to the quality of the training data sets from the outset to avoid these unnecessary costs.
Dieser Artikel wurde am 14.May 2019 von Jan Knupper geschrieben.
We are using cookies to give you the best experience on our website.
Find further information in our data protection policy. Change cookie settings.
Cookies are small text files that are cached when you visit a website to make the user experience more efficient.
We are allowed to store cookies on your device if they are absolutely necessary for the operation of the site. For all other cookies we need your consent.
You can at any time change or withdraw your consent from the Cookie Declaration on our website. Find the link to your settings in our footer.
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot properly without these cookies.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as additional cookies.
Please enable Strictly Necessary Cookies first so that we can save your preferences!