• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

AI CEOs from OpenAI, Anthropic, and Microsoft set aside their rivalry to warn Congress AI is making it too easy to design and create bioweapons

2

Ohio city workers are covering automated license plate readers with trash bags as officials sound the alarm on 'egregious violations' of privacy

3

MacKenzie Scott's approach to her $26 billion giving spree was inspired by a book she read in college about writing

1

AI CEOs from OpenAI, Anthropic, and Microsoft set aside their rivalry to warn Congress AI is making it too easy to design and create bioweapons

2

Ohio city workers are covering automated license plate readers with trash bags as officials sound the alarm on 'egregious violations' of privacy

3

MacKenzie Scott's approach to her $26 billion giving spree was inspired by a book she read in college about writing
TechAI

OpenAI’s deep research can complete 26% of Humanity’s Last Exam—a benchmark for the frontier of human knowledge

By
Greg McKenna
Greg McKenna
News Fellow
Down Arrow Button Icon
By
Greg McKenna
Greg McKenna
News Fellow
Down Arrow Button Icon
February 12, 2025, 1:58 AM ET
Sam Altman holds a microphone and speaks amid a bright multicolor backdrop.
Sam Altman, CEO of OpenAI, whose AI agent has set a new standard of performance on Humanity’s Last Exam.Nathan Laine—Bloomberg/Getty Images

Artificial intelligence may be more than a quarter of the way to surpassing the boundaries of human knowledge. OpenAI’s new autonomous agent, deep research, has stormed past competing models and set a new standard on Humanity’s Last Exam, a global benchmark created to determine when AI can answer questions on any topic better than a world-class expert in the field.

Recommended Video

Deep research successfully completed 26.6% of the recently developed test, which consists of over 3,000 questions across hundreds of subjects ranging from rocket science to analytic philosophy. Powered by OpenAI’s frontier o3 model, the AI agent can synthesize a wide range of information and complete multistep research within five-to-30 minutes, its creators say.

OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the exam, meaning OpenAI’s new agent represents a nearly threefold jump in performance. The company said the largest gains appeared on inquiries related to chemistry, humanities and social sciences, and mathematics.

Frank Downing, a director of research at Cathie Wood’s ARK Invest, noted that OpenAI’s new agent also set a new state-of-the-art score on GAIA, a test for AI assistants that poses real-world questions that are conceptually simple for humans, but challenging for most digital agents. The new offering provides deeper research and analysis, he added, compared with a competing product launched by Google in December.

But all those accomplishments could look miniscule, Downing said, if subsequent models from OpenAI and competitors make progress on solving Humanity’s Last Exam at a pace similar to how weaker AI models conquered previous academic benchmarks.  

“Humanity’s Last Exam could be saturated within the next 12 months,” he wrote in a note Monday, “effectively surpassing expert-level technical knowledge and reasoning capability.”

What is Humanity’s Last Exam?

The test is the result of an effort led by Dan Hendrycks, the director of the Center for AI Safety and an advisor for companies such as Scale AI and Elon Musk’s xAI. He previously had created another exam called Massive Multitask Language Understanding, or MMLU, which cutting-edge versions of Anthropic’s Claude, Meta’s Llama, and OpenAI’s Chat GPT have been able to mostly crack as of late last year.

Hendrycks said he was inspired to create Humanity’s Last Exam after a conversation with Musk about existing AI tests being too easy.

“Elon looked at the MMLU questions and said, ‘These are undergrad level. I want things that a world-class expert could do,’” Hendrycks told the New York Times in January.

So Hendrycks, with support from Scale AI, spearheaded a project designed to serve as “the final closed-ended academic benchmark of its kind with broad subject coverage.” His team compiled questions submitted by hundreds of college professors, prize-winning mathematicians, and other experts in their fields.

“[The exam] emphasizes world-class mathematics problems aimed at testing deep reasoning skills broadly applicable across multiple academic areas,” the team wrote in a paper debuting the test in January.

Once models start scoring over 50%, Hendrycks said, it’s safe to say humans have met their match in this regard. After that, the clock is presumably ticking until the world witnesses what is termed artificial general intelligence, or the ability of a machine to possess all the cognitive abilities of humans. OpenAI says it envisions this technology, commonly dubbed AGI, as being capable of producing novel scientific research.

“We are now confident we know how to build AGI as we have traditionally understood it,” OpenAI CEO Sam Altman said in a blog post in January.

On Sunday, Google DeepMind CEO Demis Hassabis said it could arrive in just five years.

“And I think society needs to get ready for that and what implications that will have,” he said in Paris on Sunday ahead of the AI Action Summit hosted by the city, CNBC reported.

On that front, time seems to be of the essence.

About the Author
By Greg McKennaNews Fellow
LinkedIn icon

Greg McKenna is a news fellow at Fortune.

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

Tech stocks lead market bloodbath as fears of Fed rate hikes add to worries about the AI-fueled chip boom petering out
Investingtech stocks
Tech stocks lead market bloodbath as fears of Fed rate hikes add to worries about the AI-fueled chip boom petering out
By Jason MaJune 5, 2026
3 hours ago
The Class of 2026: Meet the 12 companies making their Fortune 500 debut
Startups & VentureFortune 500
The Class of 2026: Meet the 12 companies making their Fortune 500 debut
By Marco Quiroz-GutierrezJune 5, 2026
3 hours ago
jack
PoliticsElections
A Kennedy, Kellyanne Conway’s ex-husband and a former Palantir data scientist debated AI regulation. Welcome to the Manhattan primary
By Anthony Izaguirre and The Associated PressJune 5, 2026
7 hours ago
Elon Musk holding a glass of wine.
BankingSpaceX
Jamie Dimon called Elon Musk the ‘Edison of our time’ as JPMorgan hosted SpaceX’s $75 billion IPO road show—and even invited Musk’s mom
By Tristan BoveJune 5, 2026
8 hours ago
boss
Future of WorkProductivity
AI productivity gains are real but so is bad management: ‘Leaders are really struggling to articulate what the vision and strategy is’
By Sasha RogelbergJune 5, 2026
8 hours ago
cma
AIGoogle
Google forced to allow news sites to opt out of AI scraping in ‘world first,’ UK watchdog says
By Kelvin Chan and The Associated PressJune 5, 2026
9 hours ago

Most Popular

AI CEOs from OpenAI, Anthropic, and Microsoft set aside their rivalry to warn Congress AI is making it too easy to design and create bioweapons
AI
AI CEOs from OpenAI, Anthropic, and Microsoft set aside their rivalry to warn Congress AI is making it too easy to design and create bioweapons
By Marco Quiroz-GutierrezJune 5, 2026
17 hours ago
Ohio city workers are covering automated license plate readers with trash bags as officials sound the alarm on 'egregious violations' of privacy
Cybersecurity
Ohio city workers are covering automated license plate readers with trash bags as officials sound the alarm on 'egregious violations' of privacy
By Sasha RogelbergJune 3, 2026
2 days ago
MacKenzie Scott's approach to her $26 billion giving spree was inspired by a book she read in college about writing
Success
MacKenzie Scott's approach to her $26 billion giving spree was inspired by a book she read in college about writing
By Sydney LakeJune 5, 2026
18 hours ago
10,000 Boomers a day, $39 trillion in debt, and no benefit cuts: Bessent stakes Social Security on the Trump economy
Economy
10,000 Boomers a day, $39 trillion in debt, and no benefit cuts: Bessent stakes Social Security on the Trump economy
By Nick LichtenbergJune 4, 2026
1 day ago
Social Security faces a 24% cut in 2032—that's a $345 billion hit to retirees nationwide, watchdog says
Economy
Social Security faces a 24% cut in 2032—that's a $345 billion hit to retirees nationwide, watchdog says
By Nick LichtenbergJune 5, 2026
18 hours ago
CEO says anyone who works from home is grabbing groceries or at the vet 30% of the time—and shows off his busy office at Friday 5 p.m. to prove it
Success
CEO says anyone who works from home is grabbing groceries or at the vet 30% of the time—and shows off his busy office at Friday 5 p.m. to prove it
By Orianna Rosa RoyleJune 4, 2026
2 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.