• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechAI

OpenAI’s deep research can complete 26% of Humanity’s Last Exam—a benchmark for the frontier of human knowledge

By
Greg McKenna
Greg McKenna
News Fellow
Down Arrow Button Icon
By
Greg McKenna
Greg McKenna
News Fellow
Down Arrow Button Icon
February 12, 2025, 1:58 AM ET
Sam Altman holds a microphone and speaks amid a bright multicolor backdrop.
Sam Altman, CEO of OpenAI, whose AI agent has set a new standard of performance on Humanity’s Last Exam.Nathan Laine—Bloomberg/Getty Images

Artificial intelligence may be more than a quarter of the way to surpassing the boundaries of human knowledge. OpenAI’s new autonomous agent, deep research, has stormed past competing models and set a new standard on Humanity’s Last Exam, a global benchmark created to determine when AI can answer questions on any topic better than a world-class expert in the field.

Recommended Video

Deep research successfully completed 26.6% of the recently developed test, which consists of over 3,000 questions across hundreds of subjects ranging from rocket science to analytic philosophy. Powered by OpenAI’s frontier o3 model, the AI agent can synthesize a wide range of information and complete multistep research within five-to-30 minutes, its creators say.

OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the exam, meaning OpenAI’s new agent represents a nearly threefold jump in performance. The company said the largest gains appeared on inquiries related to chemistry, humanities and social sciences, and mathematics.

Frank Downing, a director of research at Cathie Wood’s ARK Invest, noted that OpenAI’s new agent also set a new state-of-the-art score on GAIA, a test for AI assistants that poses real-world questions that are conceptually simple for humans, but challenging for most digital agents. The new offering provides deeper research and analysis, he added, compared with a competing product launched by Google in December.

But all those accomplishments could look miniscule, Downing said, if subsequent models from OpenAI and competitors make progress on solving Humanity’s Last Exam at a pace similar to how weaker AI models conquered previous academic benchmarks.  

“Humanity’s Last Exam could be saturated within the next 12 months,” he wrote in a note Monday, “effectively surpassing expert-level technical knowledge and reasoning capability.”

What is Humanity’s Last Exam?

The test is the result of an effort led by Dan Hendrycks, the director of the Center for AI Safety and an advisor for companies such as Scale AI and Elon Musk’s xAI. He previously had created another exam called Massive Multitask Language Understanding, or MMLU, which cutting-edge versions of Anthropic’s Claude, Meta’s Llama, and OpenAI’s Chat GPT have been able to mostly crack as of late last year.

Hendrycks said he was inspired to create Humanity’s Last Exam after a conversation with Musk about existing AI tests being too easy.

“Elon looked at the MMLU questions and said, ‘These are undergrad level. I want things that a world-class expert could do,’” Hendrycks told the New York Times in January.

So Hendrycks, with support from Scale AI, spearheaded a project designed to serve as “the final closed-ended academic benchmark of its kind with broad subject coverage.” His team compiled questions submitted by hundreds of college professors, prize-winning mathematicians, and other experts in their fields.

“[The exam] emphasizes world-class mathematics problems aimed at testing deep reasoning skills broadly applicable across multiple academic areas,” the team wrote in a paper debuting the test in January.

Once models start scoring over 50%, Hendrycks said, it’s safe to say humans have met their match in this regard. After that, the clock is presumably ticking until the world witnesses what is termed artificial general intelligence, or the ability of a machine to possess all the cognitive abilities of humans. OpenAI says it envisions this technology, commonly dubbed AGI, as being capable of producing novel scientific research.

“We are now confident we know how to build AGI as we have traditionally understood it,” OpenAI CEO Sam Altman said in a blog post in January.

On Sunday, Google DeepMind CEO Demis Hassabis said it could arrive in just five years.

“And I think society needs to get ready for that and what implications that will have,” he said in Paris on Sunday ahead of the AI Action Summit hosted by the city, CNBC reported.

On that front, time seems to be of the essence.

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
By Greg McKennaNews Fellow
LinkedIn icon

Greg McKenna is a news fellow at Fortune.

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

AP
AIMedia
Associated Press starts offering buyouts to newspaper journalists amid wider AI transformation
By David Bauder and The Associated PressApril 6, 2026
4 hours ago
Sam Altman says AI superintelligence is so big that we need a ‘New Deal.’ Critics say OpenAI’s policy ideas are a cover for ‘regulatory nihilism’
AIOpenAI
Sam Altman says AI superintelligence is so big that we need a ‘New Deal.’ Critics say OpenAI’s policy ideas are a cover for ‘regulatory nihilism’
By Sharon GoldmanApril 6, 2026
4 hours ago
altman
AIdisruption
Sam Altman’s big pitch to fix the big AI mess sounds like Jamie Dimon’s: a 4-day workweek and a big new tax on rich people like him
By Jake AngeloApril 6, 2026
5 hours ago
robot with drill in arm
AIRobots
‘No one’s raising their hand’: Japan’s labor crisis is making the case for robots taking the jobs you don’t want
By Catherina GioinoApril 6, 2026
5 hours ago
H&R Block CEO shares the deeply human fear that separates middle managers from those destined for the C-suite
C-SuiteNext to Lead
H&R Block CEO shares the deeply human fear that separates middle managers from those destined for the C-suite
By Ruth UmohApril 6, 2026
7 hours ago
Young man working on a laptop drinks coffee in a restaurant
EconomyJobs
AI is cutting 16,000 U.S. jobs a month—and Gen Z is taking the brunt, Goldman Sachs says
By Nick LichtenbergApril 6, 2026
9 hours ago

Most Popular

The U.S. military set up an improvised airfield deep inside Iran to rescue the F-15 airman. Marines just practiced building one in the desert
Politics
The U.S. military set up an improvised airfield deep inside Iran to rescue the F-15 airman. Marines just practiced building one in the desert
By Fortune EditorsApril 5, 2026
1 day ago
During the rescue of the F-15 airman in Iran, the U.S. military blew up two of its own transport planes that had to be left behind
Politics
During the rescue of the F-15 airman in Iran, the U.S. military blew up two of its own transport planes that had to be left behind
By Fortune EditorsApril 5, 2026
1 day ago
A CIA deception campaign in Iran helped the spy agency uncover the location of the downed F-15 airman, who was hiding in a mountain crevice
Politics
A CIA deception campaign in Iran helped the spy agency uncover the location of the downed F-15 airman, who was hiding in a mountain crevice
By Fortune EditorsApril 5, 2026
1 day ago
Millions of Americans paid billions in tariffs later ruled illegal — and they won't see a dime back
Commentary
Millions of Americans paid billions in tariffs later ruled illegal — and they won't see a dime back
By Fortune EditorsApril 6, 2026
12 hours ago
Netflix cofounder says he stopped work at 5 p.m. every Tuesday for 30 years to stay 'sane,' no matter the crisis: 'Nothing got in the way of that'
Success
Netflix cofounder says he stopped work at 5 p.m. every Tuesday for 30 years to stay 'sane,' no matter the crisis: 'Nothing got in the way of that'
By Fortune EditorsApril 5, 2026
2 days ago
Meet a 74-year-old New Yorker who unretired to become an Uber driver: 'I'm amazed at what people will tell me'
Personal Finance
Meet a 74-year-old New Yorker who unretired to become an Uber driver: 'I'm amazed at what people will tell me'
By Fortune EditorsApril 4, 2026
2 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.