• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

After forcing workers back to the office, Goldman Sachs and JPMorgan Chase are now letting their staff work remotely—but only for the World Cup

2

Markets tumble worldwide as Fed resets expectations: $400 billion wiped off SpaceX stock

3

Meet the 2 men putting New York's $300 billion pension fund in play for the first time in 20 years

1

After forcing workers back to the office, Goldman Sachs and JPMorgan Chase are now letting their staff work remotely—but only for the World Cup

2

Markets tumble worldwide as Fed resets expectations: $400 billion wiped off SpaceX stock

3

Meet the 2 men putting New York's $300 billion pension fund in play for the first time in 20 years
TechAI

If data is the new oil, these companies are the new Baker Hughes

Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
Jeremy Kahn
By
Jeremy Kahn
Jeremy Kahn
Editor, AI
Down Arrow Button Icon
February 4, 2020, 7:00 AM ET
Add Fortune on Google for similar content.

Artificial intelligence runs on data. And today, in most cases, that data needs to be labeled by humans.

This is particularly true when using computer vision to identify tumors on medical scans, spot roof damage from aerial photography or figure out whether an object crossing in front of your self-driving car is a plastic bag or a mother pushing a stroller. But it’s also true for speech recognition: To train the software, someone must provide an accurate transcript to match an audio recording.

Data labeling for machine learning has spawned an entirely new industry, and the companies springing up to help businesses label their data are among the hottest “picks and shovels” investment plays for venture capitalists hoping to cash in on the current A.I. gold rush.

The latest datapoint in this data labeling boom: Labelbox, a San Francisco startup that operates a software platform for helping companies manage their data labeling tasks, on Tuesday announced it had received $25 million in additional venture capital funding.

The money is from prominent Silicon Valley venture capital firm Andreessen Horowitz, whose managing partner Peter Levine, is joining Labelbox’s board; Google’s A.I.-focused venture capital fund, Gradient Ventures; and Kleiner Perkins, another of the Valley’s best-known firms.

The investment, which is Labelbox’s Series B, or second round of institutional financing, brings the total that the not-quite-two-year-old startup has raised to $39 million.

Labelbox competes with a number of other labeling companies: there’s Scale AI, another San Francisco data labeling platform that has raised $122 million since its founding three years ago, as well as companies that specialize in running teams of human data labelers on a project basis, such as Hive, Cloudfactory, and Samasource, the startup founded by Leila Janah, who died last month at age 37, but who saw data labeling as a way to bring decent wages and skilled work to people in the developing world.

Alexandr Wang, the 23-year-old founder and CEO of Scale AI, which has worked with a number of self-driving car companies, says that the “dirty secret” of artificial intelligence is that getting the software to work well in the real world requires a large amount of high-quality data.

“Where the rubber hits the road is what does the data these A.I. systems are trained on look like?” he says. “Is that data biased? Is that data high quality? Does that data have noise? Is that data comprehensive?”

Providing labels can be relatively low-skilled work (identifying “cats” in videos) performed by thousands of contractors in traditional outsourcing hubs such as India, Romania, or the Philippines, or it can be much higher-skilled work performed by radiologists (outline the exact contours of a tumor on a medical scan) or lawyers (identify a non-compete clause in a contract). Often companies have a need for both general and more expert labeling and employ a combination of outsourcing firms, freelancers, and in-house experts to affix these annotations. The labels can be in the form of bounding boxes around objects, tagging items visually or with text labels in photographs, or entering a classification into a separate text-based database that accompanies the original data.

Wang says that with such complex work flows, data governance—how companies track what data they are using, who’s using it, and what they are doing with it— is critical. “It isn’t sexy, but it really matters,” he says. Companies trying to deploy machine learning are often slowed because they don’t have systems in place to manage data labeling efficiently, he says.

Both Scale AI and Labelbox provide tools to help companies’ machine learning and data science teams analyze the data once it is labeled, allowing them to identify blindspots and biases. For example, are men overrepresented in your X-ray data (bias)? Or did you have too few examples of cats running across the road in order to train your self-driving algorithm to brake for them (a blindspot)? “Every A.I. company needs tools to edit, manage, and review labels,” Manu Sharma, Labelbox’s co-founder and CEO, says.

Michael Phillippi, vice president of technology at Lytx, a San Diego company that sells systems that allow trucking businesses to assess and track drivers’ behavior through cameras and sensor data, says it takes about 10,000 hours of labeled 20-second video clips to train a prototype A.I. system to detect something like driver distraction. To put that system into actual production, though, requires four to five million hours of video, he says. That is a lot of labeling.

John-Isaac Clark is the CEO of Arturo.ai, a spin out from American Family Insurance that specializes in machine learning software to analyze images, including satellite and aerial photography, for the insurance industry. He says that large, well-labeled data sets are especially important for training A.I. software to correctly identify “edge cases”—unusual or rare situations.

Humans can often use common sense to deal with these situations, even when they haven’t encountered them before. Most A.I. systems, in contrast, need to have seen multiple examples during training to correctly handle them.

Both Arturo and Lytx are Labelbox customers. Clark says Labelbox enabled Arturo to reduce the number of employees it needed to supervise its data labeling contractors from four to just one.

Sharma and his co-founder Brian Rieger, who is now the Labelbox’s chief operating officer, met when they both worked in aeronautics industry, helping to design and test flight control systems. Sharma later worked for Planet Labs, a company that analyzes gigantic datasets of satellite images, where he realized the difficulty companies had with managing labeling tasks for A.I. training data and began thinking of creating a company to address this problem. His other co-founder, Dan Rasmuson, now Labelbox’s chief technology officer, had encountered similar problems working at a company that sold drone imagery.

Labelbox’s software supplies a set of labeling tools for both images and text, as well as a way to distribute data to labelers in such a way that multiple labelers can work on the same data simultaneously without duplicating any labels.

Some companies in the labeling space, such as Scale AI and Hive, provide labeling services themselves. In fact, Scale AI uses its own A.I. software to automatically generate labels for certain kinds of data. These labels are then checked by humans to ensure accuracy, Wang says.

Automatic labeling, he says, allows Scale AI’s customers to benefit from the work Scale AI has done in the past—if it has already built a system to detect cars in videos, for instance, customers may not need to train their own system from scratch. Even in cases where customers want to build their own models, he says, automatic labeling makes the process more efficient.

Labelbox, meanwhile, has taken a different approach. It doesn’t perform any labeling itself. Instead, it’s a tool for managing labeling projects and data across different contract labelers, who often work for large outsourcing firms. The software also allows Labelbox’s customers to audit the quality of labeling contractors. Labelbox gets paid based on how much data a customer runs through the software.

Andreessen Horowitz’s Levine compares Labelbox to Github, the software code repository that many companies use to manage their code. Acquired by Microsoft for $7.5 billion in 2018, it was an Andreessen Horowitz investment. “Labelbox has the potential to fill a similar role for data in the AI/ML world,” Levine writes in response to emailed questions, using shorthand for artificial intelligence and machine learning. He says the platform can serve as “a single source of truth” for training data across an organization.

This story has been updated to correct the spelling of Labelbox chief technology officer Dan Rasmuson’s last name.

More must-read stories from Fortune:

—The long ocean voyage that helped find the flaws in GPS
—Global companies enter lockdown mode as coronavirus rocks China
—3 key takeaways from Tesla’s blockbuster fourth-quarter earnings
—Facebook says its ad machine is being weakened by privacy changes
—Predicting the biggest tech headlines of 2020

Catch up with Data Sheet, Fortune’s daily digest on the business of tech.

About the Author
Jeremy Kahn
By Jeremy KahnEditor, AI
LinkedIn iconTwitter icon

Jeremy Kahn is the AI editor at Fortune, spearheading the publication's coverage of artificial intelligence. He also co-authors Eye on AI, Fortune’s flagship AI newsletter.

See full bioRight Arrow Button Icon
Add Fortune on Google for similar content.

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

Amazon Prime Day isn’t a midsummer shopping event anymore. Here’s what changed in 2026
RetailAmazon
Amazon Prime Day isn’t a midsummer shopping event anymore. Here’s what changed in 2026
By Vidhi Choudhary and Retail BrewJune 23, 2026
9 hours ago
The hidden cost of your AI rollout: burning out the high performers running it
Workplace Cultureburnout
The hidden cost of your AI rollout: burning out the high performers running it
By Mikaela Cohen and HR BrewJune 23, 2026
9 hours ago
Quantum computing stocks surge after Trump signed executive orders backing the sector
Investingquantum computing
Quantum computing stocks surge after Trump signed executive orders backing the sector
By Marco Quiroz-GutierrezJune 23, 2026
10 hours ago
Alan Greenspan testifying before the Senate Banking Committee.
BankingFederal Reserve
The man who invented the Fed’s magic trick just died. His successor is about to try it again
By Eva RoytburgJune 23, 2026
12 hours ago
Google DeepMind CEO Demis Hassabis (left) stands on a spiral staircase next to Google DeepMind researcher John Jumper.
NewslettersEye on AI
Defections from Google DeepMind prompt questions about Alphabet’s efforts to stay at the forefront of AI
By Jeremy KahnJune 23, 2026
12 hours ago
college
SuccessEducation
47% of Harvard seniors admit to cheating — and the problem existed long before ChatGPT
By Austin Sarat and The ConversationJune 23, 2026
12 hours ago

Most Popular

After forcing workers back to the office, Goldman Sachs and JPMorgan Chase are now letting their staff work remotely—but only for the World Cup
Success
After forcing workers back to the office, Goldman Sachs and JPMorgan Chase are now letting their staff work remotely—but only for the World Cup
By Orianna Rosa RoyleJune 23, 2026
18 hours ago
Markets tumble worldwide as Fed resets expectations: $400 billion wiped off SpaceX stock
Banking
Markets tumble worldwide as Fed resets expectations: $400 billion wiped off SpaceX stock
By Jim EdwardsJune 23, 2026
20 hours ago
Meet the 2 men putting New York's $300 billion pension fund in play for the first time in 20 years
Investing
Meet the 2 men putting New York's $300 billion pension fund in play for the first time in 20 years
By Nick LichtenbergJune 22, 2026
2 days ago
Current price of oil as of June 23, 2026
Personal Finance
Current price of oil as of June 23, 2026
By Joseph HostetlerJune 23, 2026
17 hours ago
Former U.S. Secret Service agent says bringing your authentic self to work stifles teamwork: 'You don’t get high performers, you get sloppiness'
Success
Former U.S. Secret Service agent says bringing your authentic self to work stifles teamwork: 'You don’t get high performers, you get sloppiness'
By Sydney LakeJune 21, 2026
3 days ago
Current price of oil as of June 22, 2026
Personal Finance
Current price of oil as of June 22, 2026
By Joseph HostetlerJune 22, 2026
2 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.