Wednesday, June 17, 2026Vol. III · No. 168Subscribe
The Mining, Energy & Technology Wire
Technology · Analysis

Open Source AI Models Leading Benchmarks

Z.AI's GLM-5.2 scored 51 on the Artificial Analysis Intelligence Index, claiming the top position among open-weight models and placing fourth overall behind only proprietary systems.

Open Source AI Models Leading Benchmarks
PhotographZ.AI's GLM-5.2 scored 51 on the Artificial Analysis Intelligence Index, claiming the top position among open-weight models and placing fourth overall behind only proprietary systems.

Z.AI released GLM-5.2 on June 16, 2026 , and the model scored 51 on the Artificial Analysis Intelligence Index v4.1, claiming the top spot among all open-weight models . That places it fourth overall on the entire leaderboard, behind only Claude Fable 5 (60), Claude Opus 4.8 (56), and OpenAI's GPT-5.5 at xhigh reasoning (55) .

The achievement marks a significant milestone in the narrowing gap between open-weight and proprietary AI systems. GLM-5.2 leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43) among open models , and Z.ai released the model's weights under an MIT open-source license , allowing unrestricted commercial use.

Can Open Models Match Proprietary Performance?

On real-world economic tasks, the answer is increasingly yes. GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328) . This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning) at 1514 .

On industry-standard third-party benchmark tests, GLM-5.2 performs above most open source flagship models and scores near or above its closed-weights rivals, particularly shining in agentic tool use and long-horizon software engineering tasks with a SWE-bench Pro score of 62.1, decisively beating GPT-5.5 (58.6) .

The model isn't without trade-offs. GLM-5.2 burns through roughly 43,000 output tokens per Intelligence Index task, of which 37,000 are spent on reasoning alone, up sharply from GLM-5.1's 26,000 . But GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level .

What's Driving the Open-Weight Surge?

Chinese AI labs now dominate the open-weight leaderboard. Chinese labs hold four of the top five positions among open-weight models, with Google's Gemma 4 as the sole Western entry in the top tier . Meta's Llama 4, which defined the open-source AI category in 2023-2024, now trails the leading Chinese open models by a wide margin on pure benchmark performance .

The timing of GLM-5.2's release carries geopolitical weight. Chinese AI company Zhipu AI announced on June 13 that GLM-5.2 will be released as open-source software under the MIT license, in a decisive response to tightened US AI export controls . The announcement came just two days after the US Commerce Secretary ordered Anthropic to block foreign access to its Fable 5 and Mythos 5 models within 48 hours .

Shares of Beijing-based Zhipu AI soared on Monday after it released GLM-5.2, with the Hong Kong-listed stock surging as much as 48 per cent to HK$1,620 in morning trading and ending the day up 32.8 per cent at HK$1,457 , according to the South China Morning Post.

How Much Does the Performance Gap Matter?

As of April 2026, the best open-weight model (GLM-5 at 85 on BenchLM) still trails the current proprietary leaders by roughly 9 points — the top closed models from OpenAI, Anthropic, and Google score around 94 . But for most practical applications — summarization, code generation, data extraction, customer support, content creation — the performance difference between an 85-point open model and a 94-point closed model is often invisible to end users .

The MMLU benchmark gap narrowed from 17.5 to just 0.3 percentage points in a single year , according to analysis from Swfte AI. What was once a years-long frontier gap is now measured in months—or weeks .

The cost advantage is substantial. Pricing for GLM-5.2 (max) is $1.40 per 1M input tokens and $4.40 per 1M output tokens , while GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench, and beats every other open model available, also beating Gemini, making it a frontier-level model for a fraction of the cost , according to VentureBeat.

What About Energy Infrastructure?

The shift toward open-weight models carries implications for data center energy consumption. The International Energy Agency's Electricity 2024 report projects that data centers could consume over 1,000 TWh by 2026 under high-growth scenarios—more than double current levels .

Mark Surman, president of the Mozilla Foundation, said that the high training and inference costs required to train and run large AI models create incentives to innovate in ways that lower cost and energy use, noting "the cost structure of open-source AI is very different from traditional open-source software because of the compute, energy, and infrastructure" , according to EE Times.

Meta's open-weight models inspired more efficient fine-tuning methods such as QLoRA , according to Carnegie Mellon University research. Industry-wide energy consumption can be reduced by sharing of resources that are energy-intensive to create, such as model weights .

Traditional enterprise data centers typically consumed 10 to 20 MW, but today, AI-ready sites often require 100 to 300 MW, and some hyperscale campuses are approaching 1 GW — roughly the equivalent of powering 800,000 homes .

What Changed This Week

A Chinese AI lab released the highest-scoring open-weight model on a major intelligence benchmark, placing fourth overall and matching proprietary systems on real-world economic tasks. The model launched under an MIT license days after US export controls blocked foreign access to Anthropic's frontier models. Stock prices for the Beijing-based company surged 48% on the news, and the model is already integrated into multiple development platforms.

What to Watch

GLM-5.2's full model weights are already live on Hugging Face under the handle zai-org/GLM-5.2 . Independent verification of the vendor-reported benchmarks will determine whether the performance claims hold under third-party testing. The broader question is whether open-weight models continue closing the gap with proprietary systems at the current pace, and what that means for data center infrastructure planning as enterprises weigh the cost and sovereignty benefits of self-hosted AI against the convenience of API-based services.


Reporting based on coverage from Artificial Analysis, VentureBeat, South China Morning Post, Crypto Briefing, Office Chai, EE Times, Carnegie Mellon University, June 13-17, 2026.

Original reporting and analysis by the Stake & Paper editorial team. See linked sources within the article.

Share this story

More from Stake & Paper

Was this article helpful?

ClaimWatch

Mining claims intelligence — from query to report, in minutes.

Every unpatented mining claim across all twelve BLM states. Leadfile audits, due diligence, site selection, regional prospecting, entity investigations, and AOI monitoring — delivered as complete report packages.

4.4M+
Claims Tracked
12
BLM States
7
Report Types
Request a Sample Report
Stake & Paper AM

One morning brief. The whole energy sector.

Original analysis, the day's most important wire stories, and market data — delivered before your first cup of coffee. Free.