Not long ago, a trillion-parameter AI model required a data center, a research team, and hundreds of millions of dollars.
zen4-ultra is that model — now open, freely downloadable, and available to anyone with the compute to run it.
What Is zen4-ultra
zen4-ultra is built on Kimi K2.5 — Moonshot AI's frontier Mixture of Experts model. The architecture:
- 1.04 trillion total parameters
- 32 billion active parameters per forward pass (384 experts, top-8 routing)
- 256K token context window
- DeepseekV3ForCausalLM architecture
Despite having a trillion parameters, zen4-ultra activates only ~3% of them per token. This is the power of MoE: frontier intelligence at efficient inference cost.
Benchmark Results
| Task | Score |
|---|---|
| AIME 2025 (with Python) | 99.1% |
| HMMT 2025 (with Python) | 95.1% |
| GPQA-Diamond | 84.5% |
| SWE-Bench Verified | 71.3% |
| BrowseComp | 60.2% |
| HLE | 44.9% |
| Terminal-Bench | 47.1% |
A 71.3% SWE-bench score means zen4-ultra can autonomously solve 71 out of 100 real GitHub issues — from diagnosis to pull request.
Agentic Capabilities
zen4-ultra was trained for agentic operation:
- 200–300 sequential tool calls without human intervention
- Heavy mode: 8 simultaneous reasoning trajectories
- Thinking mode with extended chain-of-thought via
<think>tags - Built-in browser use and code execution
Access
SafeTensors (full precision):
huggingface-cli download zenlm/zen4-ultraGGUF (quantized, 2-bit, ~370GB):
huggingface-cli download zenlm/zen4-ultra-ggufBoth are available now at huggingface.co/zenlm. The SafeTensors release is vanilla Kimi K2.5 weights with Zen branding. The GGUF variant has directional biases removed for fully neutral inference.
License
MIT. Run it, fork it, fine-tune it.
What's Next
We're developing GT-QLoRA (Gate-Targeted QLoRA) — a training method that addresses the fundamental challenge of fine-tuning MoE models for unbiased behavior. Standard directional ablation fails on models like zen4-ultra because any residual bias is encoded in expert routing gates, not just the residual stream.
GT-QLoRA applies LoRA to attention layers and shared experts while directly unfreezing gate weights for gradient updates — producing a model whose neutrality is trained in at every level of the architecture. Training code is at github.com/zenlm/zen4-ultra-trainer.
Read more
Zen4: Unbiased AI Models for Every Scale
Announcing the full Zen4 family: mini (4B) through ultra (1T MoE), all unbiased. Eight models covering every scale from edge to cloud — neutral, unconstrained, and built for agents, infrastructure, and the open internet.
Introducing Zen4: Open Foundation Models from 4B to 1T+
Zen4 is a complete lineup of open AI models spanning from 4B to over 1 trillion parameters, featuring consumer, coder, and ultra tiers.
Zen4 Coder: 480B Parameters, 92% Fewer Active — The MoE Code Model
Hanzo AI launches Zen4 Coder, a 480B-parameter Mixture of Experts code model that activates only 35B parameters per token — delivering frontier code intelligence at a fraction of the compute cost.