The Token Revolution in AI: China's Rising Influence

A deep dive into how the concept of tokens is reshaping the AI industry, highlighting China's leading role in this transformation.

The Token Revolution in AI

In early 2026, a set of data sparked intense discussions in the global AI industry. OpenRouter, the world’s largest AI model API aggregation platform, reported that from February 9 to 15, the token call volume of China’s large models reached 41.2 trillion, surpassing the U.S. models’ 29.4 trillion for the first time. This lead continued for several weeks, with the volume exceeding 73 trillion by mid to late March, and four out of the top five models globally were from China.

This data is not meant to compare quantities but marks a quiet revolution in the basic measurement unit of the AI industry—tokens are becoming the “kilowatt-hour” of the intelligent era. The six dimensions of models, computing power, data, applications, industry, and governance are being profoundly reshaped by this measurement unit. Understanding AI in 2026 begins with understanding tokens.

Sixfold Reconstruction from a Measurement Unit

The measurement unit of the Industrial Revolution was the “kilowatt-hour,” allowing energy to be accurately measured, priced, and transmitted across domains. The Information Revolution’s units were “bits” and “bandwidth,” enabling information to be packaged, transmitted, and billed for the first time. The measurement unit of the Intelligent Revolution is “tokens,” allowing intelligence to be segmented, measured, priced, and traded for the first time.

The popularization of the token concept and its rapid growth in usage are gradually pushing intelligence towards industrialization, marketization, and circulation.

Models

From “training as an asset” to “inference as production.” The economic value of large models is shifting from one-time training costs to long-term inference outputs. Model vendors no longer simply “sell capabilities” but directly “sell tokens”—pricing based on millions of tokens for input and output has become a global industry norm. The asset attribute of models is transitioning from “weight files” to “the ability to continuously produce tokens.”

Computing Power

From “training computing power” to “inference computing power.” Training computing power is pulsed and centralized, while inference computing power is continuous and distributed, posing new demands on latency, energy efficiency, and geographical distribution. The collaboration of computing power across “cloud-edge-end,” inference-specific chips, silicon photonics interconnects, and computing power networks is becoming the new focus of infrastructure. JPMorgan predicts that China’s inference token consumption will grow by more than two orders of magnitude by 2030 compared to 2025.

Data

From “raw data” to “tokenized corpora.” Just as raw coal must be processed into standard fuel to generate power, data entering large models also needs to be cleaned, labeled, and tokenized. In long-tail scenarios such as autonomous driving, robot training, and scientific discovery, synthetic data generated through simulation has achieved large-scale application. The construction of data factor markets has also entered a substantial phase, with “trainability” and “token output density”—rather than mere data scale—becoming the new metrics for pricing data assets. This shift is profound: the evaluation of data value is now linked to its actual contribution in the token production chain, providing a more solid economic foundation for the market allocation of data factors.

Applications

From “function delivery” to “token consumption.” Traditional software charges based on seats and functions; today, applications bill based on token call volume and business results. Intelligent agents are becoming the main consumers of tokens, with complex tasks potentially consuming hundreds of thousands or even millions of tokens. The “intelligent agent as a service” market is rapidly expanding, with performance-based billing models being implemented at scale in customer service, marketing, compliance, and programming. The essence of applications is shifting from “delivering functions” to “consuming intelligence.”

Industry

From “software industry chain” to “token industry chain.” A new industry chain is forming around the production (models and computing power), distribution (inference networks, APIs, intelligent agent protocols), consumption (applications and intelligent agents), and measurement (evaluation benchmarks, auditing, and trusted verification) of tokens. The boundaries between model layers, inference service layers, intelligent agent middleware layers, and industry application layers are becoming increasingly clear, with industry-specific intelligent agents becoming mainstream investments. Model vendors, cloud vendors, chip manufacturers, green energy operators, and content delivery network providers are forming a collaborative ecosystem of the token industry chain. According to the China Academy of Information and Communications Technology, the scale of China’s core AI industry is expected to exceed 1.2 trillion yuan by 2026, with the effects of the entire industry chain collaboration becoming evident.

Governance

From “algorithm governance” to “full-chain governance of tokens.” As the AI industry has developed, the governance focus has expanded from “algorithms and code” to the entire chain of token production, circulation, consumption, and cross-border flow: traceability of tokens, identification of synthetic content, cross-border token flow, constraints on computing power and energy consumption, and trusted evaluation and benchmarks—all of these new issues call for new governance tools and rules. The year 2026 may become a key year for the concentrated implementation of global AI governance rules.

China’s Position in the Global Token Wave

In the global wave brought by tokens, China is forming a unique position supported by multiple factors.

On the production side, domestic model clusters are rising. A number of domestic models, such as MiniMax, Dark Side of the Moon, Deep Quest, Zhipu, Alibaba Qianwen, and Byte Bean, have leveraged mixed expert architectures and extreme engineering optimization to continuously improve performance while reducing inference prices to a fraction of comparable global models. On the OpenRouter platform, while U.S. users account for 47% and Chinese users only about 6%, the call volume is led by Chinese models—this is a recognition determined by global developers voting with their feet.

On the consumption side, applications are penetrating deeper than ever, with tokens entering daily life at an unprecedented speed. A general practitioner in a county hospital, faced with a suspicious lung CT scan, can have AI circle nodules and provide differential diagnosis suggestions in just a few seconds and thousands of tokens, compressing what used to take two weeks for a consultation into a single outpatient visit. A farmer in Shouguang, Shandong, can take a picture of a curled cucumber, and a smart agriculture app provides tokenized agricultural knowledge to identify whether it’s a thrips or a viral disease and what medication to use. An elderly person living alone can tell a smart speaker in their dialect, “I feel chest tightness,” and after a few thousand tokens of conversation, their children’s phones receive alerts and location sharing with emergency services. Delivery riders no longer hear mechanical instructions like “turn right ahead” but receive route planning based on real-time traffic and elevator wait times. AI assistants in government service halls are available around the clock to answer inquiries about medical insurance transfers, real estate registrations, and other policies, turning “people running errands” into “tokens running errands.” Tokens are becoming the “invisible labor force” across various industries.

At the industry chain level, a full-stack collaborative ecosystem is rapidly taking shape. From domestic chips like Ascend, Cambricon, and Haiguang to inference service platforms like Volcano Engine, Alibaba Cloud, and Tencent Cloud, along with a range of open-source middleware and industry-specific intelligent agents, the entire industry chain covering chips, computing power, models, middleware, and applications is rapidly improving. The “East Data West Computing” project provides low-cost computing power, while green energy directly supplies data centers, solidifying the energy foundation.

However, it is essential to recognize that there remains significant room for improvement in areas such as original model innovation, high-end computing power foundations, cross-language and cross-cultural ecological influence, and participation in global rule-making.

The second half of the token wave is not about “already winning” but rather “just beginning.” In the global landscape unfolding from the small token, China is both a massive market and should be an active builder and responsible co-governor. Understanding tokens is key to understanding the next phase of artificial intelligence.

Was this helpful?

Likes and saves are stored in your browser on this device only (local storage) and are not uploaded to our servers.

Comments

Discussion is powered by Giscus (GitHub Discussions). Add repo, repoID, category, and categoryID under [params.comments.giscus] in hugo.toml using the values from the Giscus setup tool.