Dense models waste 90% of their parameters on every token. For code — where knowledge is highly structured across syntax, semantics, type systems, and natural language — this waste is especially acute.
Today we're launching Zen4 Coder, a 480-billion parameter Mixture of Experts model that activates only 35B parameters per forward pass. It knows as much as a 480B model. It costs as much as a 35B model.
Three Models, Every Code Workflow
Zen4 Coder — The general-purpose code model.
- 480B total / 35B active (MoE)
- 163K context window
- All major programming languages
- Agentic tool use and multi-step reasoning
Zen4 Coder Pro — Maximum accuracy.
- 480B parameters, full BF16 precision (Dense)
- 131K context window
- For complex multi-file refactoring and large codebase navigation
- When you need the best answer, not the fastest
Zen4 Coder Flash — Real-time code assistance.
- 30B total / 3B active (MoE)
- 262K context window
- Inline completions, tab-complete, and code chat
- Sub-100ms latency for interactive workflows
Why MoE Architecture Matters for Code
The Mixture of Distilled Experts (MoDE) architecture routes each token to specialized expert subnetworks. In Zen4 Coder, this creates dedicated pathways for:
- Syntax experts — Language-specific grammar and structure
- Semantic experts — Meaning, intent, and program logic
- Type inference experts — Type systems, generics, and constraints
- Documentation experts — Natural language in comments, docstrings, and explanations
When generating Python, the Python syntax experts activate alongside the semantic reasoning experts. The Java type system experts stay dormant. This selective activation is what makes a 480B model run like a 35B model.
Context Windows Built for Real Codebases
The 163K context window on Zen4 Coder fits:
- A typical microservice codebase (50-100 files)
- An entire Go module or Rust crate
- A full React application with components, hooks, and state management
The 262K window on Coder Flash handles even larger workloads for code search and navigation.
The 131K window on Coder Pro is optimized for deep analysis — trading context length for maximum accuracy on complex reasoning tasks.
Available Now
All three Zen4 Coder models are available through the Hanzo AI Gateway at hanzo.ai. Use the standard OpenAI-compatible API:
curl https://llm.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-d '{
"model": "zen4-coder",
"messages": [{"role": "user", "content": "Refactor this function to use async/await..."}]
}'Same API, same key, same billing as every other model on the platform.
Read more
One API for Every AI Model: Introducing the Hanzo AI Gateway
Hanzo AI launches the industry's first zero-markup multi-provider AI gateway — one API key for 100+ models from every major provider, plus 14 proprietary Zen models.
zen4-ultra: A Trillion-Parameter AI Model, Open and Free
zen4-ultra brings Kimi K2.5 — a 1.04 trillion-parameter Mixture of Experts model — to the Zen AI family. Open weights, 256K context, 71% SWE-bench. Available now on HuggingFace.
Zen4: Unbiased AI Models for Every Scale
Announcing the full Zen4 family: mini (4B) through ultra (1T MoE), all unbiased. Eight models covering every scale from edge to cloud — neutral, unconstrained, and built for agents, infrastructure, and the open internet.