• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

If data is the new oil, these companies are the new Baker Hughes

Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
February 4, 2020, 7:00 AM ET

Artificial intelligence runs on data. And today, in most cases, that data needs to be labeled by humans.

This is particularly true when using computer vision to identify tumors on medical scans, spot roof damage from aerial photography or figure out whether an object crossing in front of your self-driving car is a plastic bag or a mother pushing a stroller. But it’s also true for speech recognition: To train the software, someone must provide an accurate transcript to match an audio recording.

Data labeling for machine learning has spawned an entirely new industry, and the companies springing up to help businesses label their data are among the hottest “picks and shovels” investment plays for venture capitalists hoping to cash in on the current A.I. gold rush.

The latest datapoint in this data labeling boom: Labelbox, a San Francisco startup that operates a software platform for helping companies manage their data labeling tasks, on Tuesday announced it had received $25 million in additional venture capital funding.

The money is from prominent Silicon Valley venture capital firm Andreessen Horowitz, whose managing partner Peter Levine, is joining Labelbox’s board; Google’s A.I.-focused venture capital fund, Gradient Ventures; and Kleiner Perkins, another of the Valley’s best-known firms.

The investment, which is Labelbox’s Series B, or second round of institutional financing, brings the total that the not-quite-two-year-old startup has raised to $39 million.

Labelbox competes with a number of other labeling companies: there’s Scale AI, another San Francisco data labeling platform that has raised $122 million since its founding three years ago, as well as companies that specialize in running teams of human data labelers on a project basis, such as Hive, Cloudfactory, and Samasource, the startup founded by Leila Janah, who died last month at age 37, but who saw data labeling as a way to bring decent wages and skilled work to people in the developing world.

Alexandr Wang, the 23-year-old founder and CEO of Scale AI, which has worked with a number of self-driving car companies, says that the “dirty secret” of artificial intelligence is that getting the software to work well in the real world requires a large amount of high-quality data.

“Where the rubber hits the road is what does the data these A.I. systems are trained on look like?” he says. “Is that data biased? Is that data high quality? Does that data have noise? Is that data comprehensive?”

Providing labels can be relatively low-skilled work (identifying “cats” in videos) performed by thousands of contractors in traditional outsourcing hubs such as India, Romania, or the Philippines, or it can be much higher-skilled work performed by radiologists (outline the exact contours of a tumor on a medical scan) or lawyers (identify a non-compete clause in a contract). Often companies have a need for both general and more expert labeling and employ a combination of outsourcing firms, freelancers, and in-house experts to affix these annotations. The labels can be in the form of bounding boxes around objects, tagging items visually or with text labels in photographs, or entering a classification into a separate text-based database that accompanies the original data.

Wang says that with such complex work flows, data governance—how companies track what data they are using, who’s using it, and what they are doing with it— is critical. “It isn’t sexy, but it really matters,” he says. Companies trying to deploy machine learning are often slowed because they don’t have systems in place to manage data labeling efficiently, he says.

Both Scale AI and Labelbox provide tools to help companies’ machine learning and data science teams analyze the data once it is labeled, allowing them to identify blindspots and biases. For example, are men overrepresented in your X-ray data (bias)? Or did you have too few examples of cats running across the road in order to train your self-driving algorithm to brake for them (a blindspot)? “Every A.I. company needs tools to edit, manage, and review labels,” Manu Sharma, Labelbox’s co-founder and CEO, says.

Michael Phillippi, vice president of technology at Lytx, a San Diego company that sells systems that allow trucking businesses to assess and track drivers’ behavior through cameras and sensor data, says it takes about 10,000 hours of labeled 20-second video clips to train a prototype A.I. system to detect something like driver distraction. To put that system into actual production, though, requires four to five million hours of video, he says. That is a lot of labeling.

John-Isaac Clark is the CEO of Arturo.ai, a spin out from American Family Insurance that specializes in machine learning software to analyze images, including satellite and aerial photography, for the insurance industry. He says that large, well-labeled data sets are especially important for training A.I. software to correctly identify “edge cases”—unusual or rare situations.

Humans can often use common sense to deal with these situations, even when they haven’t encountered them before. Most A.I. systems, in contrast, need to have seen multiple examples during training to correctly handle them.

Both Arturo and Lytx are Labelbox customers. Clark says Labelbox enabled Arturo to reduce the number of employees it needed to supervise its data labeling contractors from four to just one.

Sharma and his co-founder Brian Rieger, who is now the Labelbox’s chief operating officer, met when they both worked in aeronautics industry, helping to design and test flight control systems. Sharma later worked for Planet Labs, a company that analyzes gigantic datasets of satellite images, where he realized the difficulty companies had with managing labeling tasks for A.I. training data and began thinking of creating a company to address this problem. His other co-founder, Dan Rasmuson, now Labelbox’s chief technology officer, had encountered similar problems working at a company that sold drone imagery.

Labelbox’s software supplies a set of labeling tools for both images and text, as well as a way to distribute data to labelers in such a way that multiple labelers can work on the same data simultaneously without duplicating any labels.

Some companies in the labeling space, such as Scale AI and Hive, provide labeling services themselves. In fact, Scale AI uses its own A.I. software to automatically generate labels for certain kinds of data. These labels are then checked by humans to ensure accuracy, Wang says.

Automatic labeling, he says, allows Scale AI’s customers to benefit from the work Scale AI has done in the past—if it has already built a system to detect cars in videos, for instance, customers may not need to train their own system from scratch. Even in cases where customers want to build their own models, he says, automatic labeling makes the process more efficient.

Labelbox, meanwhile, has taken a different approach. It doesn’t perform any labeling itself. Instead, it’s a tool for managing labeling projects and data across different contract labelers, who often work for large outsourcing firms. The software also allows Labelbox’s customers to audit the quality of labeling contractors. Labelbox gets paid based on how much data a customer runs through the software.

Andreessen Horowitz’s Levine compares Labelbox to Github, the software code repository that many companies use to manage their code. Acquired by Microsoft for $7.5 billion in 2018, it was an Andreessen Horowitz investment. “Labelbox has the potential to fill a similar role for data in the AI/ML world,” Levine writes in response to emailed questions, using shorthand for artificial intelligence and machine learning. He says the platform can serve as “a single source of truth” for training data across an organization.

This story has been updated to correct the spelling of Labelbox chief technology officer Dan Rasmuson’s last name.

More must-read stories from Fortune:

—The long ocean voyage that helped find the flaws in GPS
—Global companies enter lockdown mode as coronavirus rocks China
—3 key takeaways from Tesla’s blockbuster fourth-quarter earnings
—Facebook says its ad machine is being weakened by privacy changes
—Predicting the biggest tech headlines of 2020

Catch up with Data Sheet, Fortune’s daily digest on the business of tech.

About the Author
Jeremy Kahn
By Jeremy KahnEditor, AI
LinkedIn iconTwitter icon

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

hacking
CybersecurityHacking
Student hackers get revenge on final exams as ‘ShinyHunters’ takes down nearly 9,000 schools study software
By Heather Hollingsworth and The Associated PressMay 8, 2026
36 minutes ago
Michael Saylor says remarks about selling Bitcoin were intended to jam short-sellers and ‘haters’ 
CryptoBitcoin
Michael Saylor says remarks about selling Bitcoin were intended to jam short-sellers and ‘haters’ 
By Ben WeissMay 8, 2026
48 minutes ago
Apple promised a smarter Siri, but a lawsuit says it didn’t deliver—and you can get up to $95 back
LawApple
Apple promised a smarter Siri, but a lawsuit says it didn’t deliver—and you can get up to $95 back
By Catherina GioinoMay 8, 2026
52 minutes ago
Fortune 500 Power Moves: Which executives gained and lost power this week
C-SuiteFortune 500 Power Moves
Fortune 500 Power Moves: Which executives gained and lost power this week
By Fortune EditorsMay 8, 2026
1 hour ago
Tired hispanic man in a professional suit feeling sad while waiting for the appointment of a job interview at a recruitment office
EconomyJobs
The job market is healing for everyone—except in the office
By Eva RoytburgMay 8, 2026
2 hours ago
Anthropic grew 80-fold in a single quarter. Now it’s renting Elon Musk’s data center to cope
AIAnthropic
Anthropic grew 80-fold in a single quarter. Now it’s renting Elon Musk’s data center to cope
By Marco Quiroz-GutierrezMay 8, 2026
2 hours ago

Most Popular

California farmers must destroy 420,000 peach trees after Del Monte closes its canneries and cancels more than $550 million in long-term contracts
North America
California farmers must destroy 420,000 peach trees after Del Monte closes its canneries and cancels more than $550 million in long-term contracts
By Sasha RogelbergMay 7, 2026
20 hours ago
U.S. Treasury will have to borrow $2 trillion this year just to continue functioning—more than $166 billion every month
Economy
U.S. Treasury will have to borrow $2 trillion this year just to continue functioning—more than $166 billion every month
By Eleanor PringleMay 7, 2026
1 day ago
'Blue dot fever' plagues musicians like Post Malone, Meghan Trainor, and Zayn as a growing list of artists cancel tours due to lagging ticket sales
Arts & Entertainment
'Blue dot fever' plagues musicians like Post Malone, Meghan Trainor, and Zayn as a growing list of artists cancel tours due to lagging ticket sales
By Dave Lozo and Morning BrewMay 7, 2026
22 hours ago
A Michigan farm town voted down plans for a giant OpenAI-Oracle data center. Weeks later, construction began
Magazine
A Michigan farm town voted down plans for a giant OpenAI-Oracle data center. Weeks later, construction began
By Sharon GoldmanMay 6, 2026
2 days ago
Current price of oil as of May 7, 2026
Personal Finance
Current price of oil as of May 7, 2026
By Joseph HostetlerMay 7, 2026
1 day ago
Airbnb CEO Brian Chesky warns two types of people won’t survive the AI era: ‘pure people managers’ and workers who resist change
Success
Airbnb CEO Brian Chesky warns two types of people won’t survive the AI era: ‘pure people managers’ and workers who resist change
By Emma BurleighMay 7, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.