Claude's Chinese Language Tokenization Cost 65% Higher Than English, OpenAI Only 15% More

Gate News message, April 29 — AI researcher Aran Komatsuzaki conducted a comparative analysis of tokenization efficiency across six major AI models by translating Rich Sutton’s seminal paper “The Bitter Lesson” into nine languages and processing them through OpenAI, Gemini, Qwen, DeepSeek, Kimi, and Claude’s tokenizers. Using the English version’s token count on OpenAI as the baseline (1x), the study revealed significant disparities: processing the same content in Chinese required 1.65x tokens on Claude, compared to only 1.15x on OpenAI. Hindi showed an even more extreme result on Claude, exceeding the baseline by over 3x. Anthropic ranked lowest among the six models tested.

Critically, when the identical Chinese text was processed across different models—all measured against the same English baseline—the results diverged dramatically: Kimi consumed only 0.81x tokens (even less than English), Qwen 0.85x, while Claude required 1.65x. This gap reveals a pure tokenization efficiency problem, not an inherent language issue. Chinese models demonstrated superior efficiency in processing Chinese, suggesting the disparity stems from tokenizer optimization rather than the language itself.

The practical implications for users are substantial: increased token consumption directly raises API costs, extends model response latency, and depletes context windows more rapidly. Tokenization efficiency depends on the linguistic composition of a model’s training data—models trained predominantly on English compress English text more efficiently, while languages with lower data representation are tokenized into smaller, less efficient fragments.

Komatsuzaki’s conclusion underscores a fundamental principle: market size determines tokenization efficiency. Larger markets receive better optimization, while underrepresented languages face significantly higher token costs.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

DeepSeek Multimodal Researcher Hints at New Vision Model on April 29

On April 29, DeepSeek multimodal team researcher Xiaokang Chen posted on X: "Now, we see you," accompanied by two images of the DeepSeek whale mascot—one with closed eyes, the other with open eyes. The post appears to hint at an upcoming vision model, aligning with Chen's role as a researcher in Dee

GateNews37m ago

LG Expands Nvidia Partnership Into Physical AI, Covering Robotics and Data Centers

Gate News message, April 29 — South Korea's LG Electronics announced during its first-quarter 2026 earnings call that it is expanding its collaboration with Nvidia into physical AI, with planned projects in robotics, mobility, and data centers. LG plans to integrate its home robot CLOiD with Nvidia

GateNews55m ago

Semiconductor analysts are bullish on the AI market, saying it will run “at least another three years”: advanced packaging is the industry bottleneck

Bubble Boi says the AI investment cycle is still in its early stage, with expectations of at least three more years of growth, and he has no intention of taking profits. He believes advanced packaging is the real bottleneck for semiconductors, and that more HBM and larger chips need to be integrated within the same package. He is bullish on NAND/Flash, and prices may keep rising; in the future, he may also add to the flash supply chain. His personal strategy is to borrow funds to increase his holdings, and to use his engineering and practical background to understand the technical details, which he sees as an advantage.

ChainNewsAbmedia1h ago

AWS Expands OpenAI Integration in Amazon Bedrock

Amazon Web Services announced on April 29 a significant expansion of its partnership with OpenAI, integrating OpenAI's latest capabilities into its cloud infrastructure. The expansion brings three new offerings to Amazon Bedrock: OpenAI's latest models (limited preview), the Codex programming

CryptoFrontier1h ago

OpenAI Researchers: AI Systems Could Handle Most Research Work Within Two Years

Gate News message, April 29 — OpenAI researchers Sébastien Bubeck and Ernest Ryu say AI systems could perform most human research work within two years, presenting mathematics as a clear measure of AI progress. Unlike vague performance tests, mathematical problems offer precise verification:

GateNews2h ago

King Charles III Meets Six U.S. Tech CEOs Including Jensen Huang, Jeff Bezos, and Tim Cook to Discuss UK Startup Funding

Gate News message, April 29 — During his state visit to the United States, King Charles III met with six prominent American technology leaders at Blair House in Washington: NVIDIA CEO Jensen Huang, Amazon founder Jeff Bezos, Apple CEO Tim Cook, AMD CEO Su Zifeng, Salesforce CEO Marc Benioff, and

GateNews2h ago
Comment
0/400
No comments