In order for an artificial intelligence to develop its deep learning abilities, it first needs to accumulate massive amounts data in which to identify objects.
But to an AI, a picture of a cat for instance, carries no information, unless it's taught that the object is a cat.
That’s when the work of ‘data labeling’ comes in, the manual human process of sorting and tagging data with names.
"For an AI to recognize an object, you need at least one million data points. For every object in the world, there is an endless amount of data to process, and that's why the industry has infinite potential."
So-called data labelers process data in the form of text, images, video and audio.
They box an image to distinguish the object or indicate a few pixels for more delicate tasks and give the object a name.
"Anyone can label data with proper training and guidelines. You just need perseverance because most of the work is repetitive."
"The data labelling industry can also provide employment for retired seniors or women on career breaks who can easily manage the simple task of sorting data."
Part of the New Deal is a so-called 'Digital Dam,' which will collect data for use by the public and private sectors in their AI and 5G projects.
The Digital Dam will create some 390-thousand jobs in all, 75 percent of which will be in the all-important process of data labeling.
This will lead to improvements in various fields such as self-driving cars, which need to recognize and differentiate between moving cars, pedestrians and the road.
Once doctors have labeled data related to diseases and their symptoms, an AI will be able to read X-rays and MRI pictures to make diagnoses on their own.
Humans make mistakes, though, so an AI is in development that can do the labeling too.
"Auto labeling can accomplish up to 90 percent of the task. But it doesn't threaten jobs for people because there's just so much data out there to process. Only with these techniques can we meet demand in the industry in terms of price and speed."
An added benefit is that data labeling can be done in a non-contact way, so the industry expects more jobs to be created in the post-coronavirus era.
Choi Jeong-yoon, Arirang News, Daejeon.