• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year

2

Now worth $200 million, Sarah Jessica Parker credits being ‘one of eight kids that struggled financially’ for her hunger, ambition, and work ethic

3

Ikea’s billionaire founder was so frugal that he bought clothes from flea markets and took free salt and pepper from restaurants

1

MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year

2

Now worth $200 million, Sarah Jessica Parker credits being ‘one of eight kids that struggled financially’ for her hunger, ambition, and work ethic

3

Ikea’s billionaire founder was so frugal that he bought clothes from flea markets and took free salt and pepper from restaurants
AIlanguage
Asia

The world’s best AI models operate in English. Other languages—even major ones like Cantonese—risk falling further behind

By
Cecilia Hult
Cecilia Hult
Down Arrow Button Icon
By
Cecilia Hult
Cecilia Hult
Down Arrow Button Icon
July 15, 2025, 8:20 AM ET
AI models can struggle with non-English languages, like Cantonese.
AI models can struggle with non-English languages, like Cantonese.Getty Images
Add Fortune on Google for similar content.

How do you translate “dim sum”? Many English speakers would find the question strange, knowing the term refers to the large array of small dishes that accompanies a Cantonese-style brunch—and so doesn’t need translation. 

Recommended Video

But words like “dim sum” are a challenge for developers like Jacky Chan, who launched a Cantonese large language model last year through his startup Votee. It might be obvious to a human translator what words are loanwords and which need direct translation. Yet it’s less intuitive for machines.  

“It’s not natural enough,” Chan says. “When you see it, you know it’s not something a human writes.”

Translation troubles are part of a growing list of issues when today’s AI models, strongest in English and other major languages, try to work in an array of smaller tongues still spoken by tens of millions of people.  

When AI “models encounter a word they don’t know or that doesn’t exist in another culture, they will simply make up a translation,” explains Aliya Bhatia, a senior policy analyst at the Center of Democracy & Technology, where she researches issues related to multilingual AI. “As a result, many machine-created datasets could feature mistranslations, words that no native speaker actually uses in a specific language.” 

LLMs need data, and lots of it. Text from books, articles and websites is broken down into smaller word sequences to form a model’s training dataset. From this, LLMs learn how to predict the next word in a sequence, eventually generating text.  

AI can now generate text remarkably well—at least, it can in English. In other languages, performance lags significantly. Roughly half of all web content is in English, meaning there’s no shortage of digital resources for LLMs to learn from. Many other languages do not enjoy this same abundance. 

Low-resource languages

So-called low resource languages are those with limited online data. Endangered languages, no longer being passed down to younger generations, clearly fall into this category. But widely spoken languages like Cantonese, Vietnamese and Bahasa Indonesia are also considered low-resource. 

One reason could be limited internet access, which would prevent the creation of digital content. Another could be government regulation, which might limit what’s available online. Indonesia, for example, can remove online content without offering a way to appeal decisions. The resulting self-censorship may mean that available data in some regional languages might not represent authentic local culture. 

This resource gap leads to a performance gap: Non-English LLMs are more likely to produce gibberish or inaccurate answers. LLMs also struggle with languages that don’t use Latin script, the set of letters used in English, as well as those with tonal features that are hard to represent in writing or code.  

Currently, the best-performing models work in English and, to a lesser extent, Mandarin Chinese. That reflects where the world’s biggest tech companies are based. But outside of San Francisco and Hangzhou, a legion of developers, large and small, are trying to make AI work for everyone. 

South Korean internet firm Naver has built an LLM, HyperCLOVA X, which it claims is trained on 6,500 times more Korean data than GPT-4. Naver is also working in markets like Saudi Arabia and Thailand in a bid to expand its business creating “sovereign AI,” or AI tailored to a specific country’s needs. “We focus on what companies and governments that want to use AI would want, and what needs Big Tech can’t fulfill,” CEO Choi Soo-Yeon told Fortune last year.  

In Indonesia, telecom operator Indosat and tech startup Goto are collaborating to launch a 70 billion parameter LLM that operates in Bahasa Indonesia as well as five other local languages, including Javanese, Balinese, and Bataknese. 

One hurdle is scale. The most powerful LLMs are massive, made up of billions of word sequences converted into variables known as parameters. OpenAI’s GPT-4 is estimated to have around 1.8 trillion parameters. DeepSeek’s R1 has 671 billion. 

Non-English LLMs seriously struggle to achieve this kind of scale. The Southeast Asian Languages in One Model (SEA-LION) project has trained two models from scratch: One with 3 billion parameters and one with 7 billion, much smaller than leading English and Chinese models.  

Chan, from Votee, faces these struggles when dealing with Cantonese, spoken by 85 million people across southern China and Hong Kong. Cantonese uses different grammar for formal writing compared to informal writing and speech. Available digital data is scarce and often low-quality. 

Training on digitalized Cantonese texts is like “learning from a library with many books, but they have lots of typos, they are poorly translated, or they’re just plain wrong,” says Chan. 

Without a comprehensive dataset, an LLM can’t produce complete results. Data for low-resource language often skews towards formal texts—legal documents, religious texts, or Wikipedia entries—since these are more likely to be digitized. This bias can distort an LLM’s tone, vocabulary and style, and limit its knowledge.  

LLMs have no inherent sense of what is true, and so false or incomplete information will be reproduced as fact. A model trained solely on Vietnamese pop music might struggle to accurately answer questions on historical events, particularly those not related to Vietnam.  

Translating English content

Turning English content into the target language is one way to supplement the otherwise-limited training data. As Chan explains, “we synthesize the data using AI so that we can have more data to do the training.” 

But machine translation carries risk. It can miss linguistic nuance or cultural context. A Georgia Tech study of cultural bias in Arabic LLMs found that AI models trained on Arabic datasets still exhibited Western bias, such as referencing alcoholic beverages in Islamic religious contexts. It turned out that much of the pre-training data for these models came from web-crawled Arabic content that was machine-translated from English, allowing cultural values to sneak through.  

In the long-term, AI-generated content might end up polluting low-resource languages datasets. Chan likens it to “a photocopy of a photocopy,” with each iteration degrading the quality. In 2024, Nature warned of “model collapse,” where AI-generated text could contaminate the training data for future LLMs, leading to worse performance.   

The threat is even greater for low-resource languages. With less genuine content out there, AI-generated content could quickly end up making up a larger share of what’s online in a given language.  

Large businesses are starting to realize the opportunities in building a non-English AI. But while these companies are key players in their respective tech sectors, they’re still much smaller than giants like Alibaba, OpenAI, and Microsoft.  

Bhatia says more organizations—both for-profit and not-for-profit—need to invest in multilingual AI if this new technology is to be truly global.  

“If LLMs are going to be used to equip people with access to economic opportunities, educational resources, and more, they should work in the languages people use,” she says. 

Fortune is bringing Brainstorm AI back to Asia on July 22-23 with the latest edition of our Brainstorm AI Singapore conference. Fortune will be convening the smartest people we know—technologists, entrepreneurs, Fortune Global 500 executives, investors, policymakers, and the brilliant minds in between—to explore and interrogate the most pressing questions about AI. Register here!

Subscribe to Fortune Gulf Brief. Every Tuesday, this new newsletter delivers clear-eyed, authoritative intelligence on the deals, decisions, policies, and power shifts shaping one of the world’s most consequential regions, written for the people who need to act on it. Sign up here.
About the Author
By Cecilia Hult

Cecilia Hult is an editorial intern based in Hong Kong.

See full bioRight Arrow Button Icon
Add Fortune on Google for similar content.

Latest in AI

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in AI

AI is overwhelming our senses—Edward Enninful has an answer for that 
EuropeLetter from London
AI is overwhelming our senses—Edward Enninful has an answer for that 
By Kamal AhmedJune 26, 2026
3 hours ago
kid
SuccessSocial Media
Kids want to be influencers when they grow up, because they ‘gets lots of money’ and ‘they want to be famous’
By Matthew Simoneau and The ConversationJune 26, 2026
4 hours ago
cuban
AIJobs
Everyone agrees that you hate AI, but only Mark Cuban sees why Silicon Valley is powerless to fix it
By Nick LichtenbergJune 26, 2026
4 hours ago
Meet Micron, the under-the-radar chipmaker that just reported a 346% sales surge and helped stop a global AI selloff
AITech
Meet Micron, the under-the-radar chipmaker that just reported a 346% sales surge and helped stop a global AI selloff
By Marco Quiroz-GutierrezJune 26, 2026
5 hours ago
gas
LawAntitrust
Gas station owners have found a use case for AI, lawsuit says: colluding to fix prices
By R.J. Rico and The Associated PressJune 25, 2026
13 hours ago
g
AIunemployment
One of the Democratic Party’s brightest stars is co-founding a group to help with the coming AI jobs earthquake
By Josh Boak and The Associated PressJune 25, 2026
14 hours ago

Most Popular

MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year
Success
MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year
By Sydney LakeJune 25, 2026
1 day ago
Now worth $200 million, Sarah Jessica Parker credits being ‘one of eight kids that struggled financially’ for her hunger, ambition, and work ethic
Success
Now worth $200 million, Sarah Jessica Parker credits being ‘one of eight kids that struggled financially’ for her hunger, ambition, and work ethic
By Orianna Rosa RoyleJune 24, 2026
2 days ago
Ikea’s billionaire founder was so frugal that he bought clothes from flea markets and took free salt and pepper from restaurants
Success
Ikea’s billionaire founder was so frugal that he bought clothes from flea markets and took free salt and pepper from restaurants
By Orianna Rosa RoyleJune 25, 2026
1 day ago
Current price of oil as of June 25, 2026
Personal Finance
Current price of oil as of June 25, 2026
By Joseph HostetlerJune 25, 2026
23 hours ago
Current price of silver as of Thursday, June 25, 2026
Personal Finance
Current price of silver as of Thursday, June 25, 2026
By Joseph HostetlerJune 25, 2026
23 hours ago
Trump turns on Big Oil donors who spent nearly $100 million to get him elected—now he wants the DOJ to investigate them for price gouging
Economy
Trump turns on Big Oil donors who spent nearly $100 million to get him elected—now he wants the DOJ to investigate them for price gouging
By Tristan BoveJune 25, 2026
15 hours ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.