Photo data sets for training a software to perform automated online identity verification via facial recognition

Case study – Provision of thousands of photo sets of respectively one person, covering a period of min. 5 years to max. 20 years.

Thousands of Clickworkers from various continents send numerous photos, in each of which their own face is clearly visible. These photos have been digitally created within a period of min. 5 years to max. 20 years in the past. With these photo sets, an international software company is now able to train an AI system capable of clearly recognizing and identifying faces of all ethnicities and genders over the course of a lifetime.

The Challenge

AI facial recognition systems are increasingly being used to leverage the uniqueness of a face as a biometric factor for identity verification and authentication in online login processes. The growing use of biometric facial recognition for the purpose of authentication is primarily based on the fact that, unlike traditional solutions such as verification emails, passwords, fingerprints or even simple selfies, it uses unique mathematical and dynamic patterns. As a result, it is one of the most reliable authentication systems.

The algorithms on which these AI systems are based must be trained with an enormous amount of data in the form of photographs and/or videos of people, until they are able to identify people unambiguously and without error on the basis of their faces using a machine learning process.

During the learning process, a multilayer neural network is used to process the training data. This network adjusts its face recognition parameters until a person can be clearly identified. This learning process requires not only large amounts of photographs and videos of people, but also a wide variety of people depicted, corresponding to the diversity of people in the regions where the system will be deployed. In addition, in order to train a biometric face recognition system, the training data must consist of photos of people whose faces can be seen in different sizes and from various perspectives and angles. When training the system, one must also keep in mind that when it is used to authenticate people, it must be able to recognize a face at all times, even if the face changes naturally over the years, sometimes to a greater or lesser extent.

The Solution

We set up a custom-fit project on our in-house online platform in close consultation with the customer. This results in paid jobs/tasks for our registered crowdworkers, so-called Clickworkers. These tasks are made visible for 850,000 Clickworkers who match the demographics specified by the client.

Thousands of Clickworkers work on the project in accordance with the instructions obtained from the concise and descriptive task briefing. After specifying their ethnicity, the first step for each Clickworker who has accepted the task involves creating two new, short videos of themselves. In this case, they film their face, 1x with and 1x without glasses. While doing so, they slowly move their head in all directions and say a short sentence.

In the second step, each of the Clickworkers upload these two videos, as well as 60 to 200 existing digital photos of themselves — where their face is clearly recognizable — as a set to our platform. None of the photos from the set were taken on the same day, no photo is repeated, and covered in total a time period of min. 5 years to max. 20 years. The photos differ in terms of perspective or angle from which the person’s face is seen, styling (e.g. hairstyle, clothing, glasses, makeup), facial expression, and lighting conditions.

To ensure the correct implementation of the specifications, all the uploaded videos and photos are checked by our quality management team and selected accordingly. After being checked, the flawless sets are then transferred to the customer directly via an API connection.
This provides the software developer with over 300,000 photos of faces and over 6,000 high diversity videos promptly and efficiently. The software company uses this data as training data to train an AI system to clearly recognize faces until the error rate approaches zero and the system can be used for secure online authentication.

More information about our “Photo datasets for training facial recognition software” service

photo data sets

Project Data

Number of photos: >300.000 photos (50 – 200 per Clickworker/set)

Time frame of the photos: Photos should cover a period of min. 5 years to max. 20 years per set and about 10 diverse photos per year.

Number of videos: >6.000 (2 pro Clickworker/set) of approx. 30 seconds each

Proportion of Clickworkers per ethnicities: Africa 20%, South Asia 20%, Far East 20%, Latin America 20%, Caucasia 10%, Other 10%.

Photo format and size: Minimum size of the depicted head on the photos: 200 x 200 pixels, jpg or heic, landscape or portrait format

Interface versions of jobs:
Clickworker App (Smartphone Version)
Clickworker Workplace (Desktop Version)

Tasks:
1. Specifications regarding own ethnicity (via dropdown selection)
2. Creation of two short portrait selfie videos with and without glasses (approx. 30 seconds each)
3. Provision/upload of 60 – 200 photos from the mentioned period of time, on which their face is easy to recognize
4. Reading confirmation and declaration of consent of the job terms and conditions

Quality assurance: Quality check by clickworker’s quality management team

Data transfer: Data transfer via API

The Project Workflow — In Brief

  1. Project meeting with the customer. The resulting tasks are defined and recorded in a precise briefing for the Clickworkers.
  2. clickworker sets up the project. The photo creation tasks become visible as individual jobs on the clickworker platform for qualified Clickworkers.
  3. Processing of the jobs by Clickworkers. Numerous Clickworkers of the specified ethnicities accept the jobs in parallel, create the photos and videos according to the briefing, and upload them to the clickworker platform with the required information.
  4. The clickworker quality management team checks the photos and videos for compliance with the specifications/briefing.
  5. Transmission of all correct photo sets with videos to the customer via an API connection.

Benefits

  • Quick acquisition of high diversity and high quality application specific AI training data
  • Global data sourcing and market coverage across all continents
  • International diversity of photographs of people — access to a broad-based crowd from all ethnic origins, ages and genders
  • Managed Service – Customer-specific project implementation and execution
  • Personal contact and advice
  • Quality assured results
  • Easy data transfer
  • Scaleable throughput