Zen4 is here. A complete lineup of open-weight AI models spanning edge devices to cloud-scale infrastructure, built by Zen LM and Hanzo AI, Techstars '17.
From a 4B model that runs on your phone to a trillion-parameter powerhouse for the most demanding workloads, Zen4 delivers frontier-class performance at every scale. Every model ships with open weights. No gates, no waitlists.
The Zen4 Consumer Line
The consumer line covers every deployment target from mobile to high-end workstations. All consumer models are available in both Instruct and Thinking variants.
| Model | Parameters | Architecture | Base | Context | Target |
|---|---|---|---|---|---|
| Zen4 Mini | 4B | Dense | Qwen3-4B | Standard | Edge and mobile devices |
| Zen4 | 8B | Dense | Qwen3-8B | Standard | Standard desktop inference |
| Zen4 Pro | 14B | Dense | Qwen3-14B | Standard | Professional workloads |
| Zen4 Max | 30B (3B active) | MoE | Qwen3-30B-A3B | 256K | Flagship efficiency |
| Zen4 Max Pro | 80B (3B active) | MoE | Qwen3-Next-80B-A3B | 256K | The best consumer AI |
Zen4 Mini fits comfortably on edge hardware and mobile SoCs. Zen4 and Zen4 Pro handle standard and professional workloads on commodity GPUs. The MoE models -- Zen4 Max and Zen4 Max Pro -- deliver outsized capability with only 3B active parameters per forward pass, making them remarkably efficient for their total parameter count. Both support 256K context windows.
The Zen4 Coder Line
Purpose-built for software engineering. The coder line is optimized for code generation, completion, refactoring, and agentic coding workflows.
| Model | Parameters | Architecture | Base | Context | Target |
|---|---|---|---|---|---|
| Zen4 Coder Flash | 31B (3B active) | MoE | GLM-4.7-Flash | 131K | Fast code generation |
| Zen4 Coder | 80B (3B active) | MoE | Qwen3-Coder-Next | 256K | Flagship agentic coding |
Zen4 Coder Flash is built for speed -- rapid completions, inline suggestions, and fast iteration loops. Zen4 Coder is the flagship: 256K context, full agentic coding support, and deep understanding of complex codebases. Both ship in Instruct and Thinking variants.
The Zen4 Ultra Line (Cloud)
For workloads that demand maximum capability, the Ultra line brings trillion-scale models to cloud deployments.
| Model | Parameters | Architecture | Base | Status |
|---|---|---|---|---|
| Zen4 Ultra | 1.04T (32B active) | MoE | Kimi K2.5 Thinking | Available now |
| Zen4 Ultra Max | TBA | TBA | DeepSeek V4 | Coming soon |
Zen4 Ultra activates 32B parameters from a 1.04 trillion parameter mixture-of-experts model. It represents the current ceiling of open-weight model performance. Zen4 Ultra Max, based on DeepSeek V4, is in development.
Instruct and Thinking Variants
Every Zen4 model ships in two variants:
- Instruct -- Optimized for direct instruction following, chat, and task completion. Low latency, deterministic output.
- Thinking -- Extended reasoning with chain-of-thought. Better performance on complex multi-step problems, math, and code analysis.
Choose Instruct for production APIs and interactive applications. Choose Thinking when accuracy on hard problems matters more than speed.
Available Formats
All Zen4 models are distributed in multiple formats to fit your deployment stack:
| Format | Description | Use Case |
|---|---|---|
| SafeTensors | Native PyTorch-compatible weights | GPU inference, fine-tuning |
| GGUF Q4_K_M | 4-bit quantized | CPU and edge deployment |
| GGUF Q5_K_M | 5-bit quantized | Balanced quality and size |
| GGUF Q6_K | 6-bit quantized | Higher quality, moderate size |
| GGUF Q8_0 | 8-bit quantized | Near-lossless, larger footprint |
| GGUF F16 | 16-bit float | Full precision GGUF |
| MLX | Apple MLX format | Native Apple Silicon acceleration |
Runs Locally on Apple Silicon
Every consumer and coder model in the Zen4 lineup fits on a 64GB M-series Mac. The MoE architectures are particularly well-suited to unified memory -- with only 3B active parameters per forward pass, inference is fast and responsive even on laptop hardware.
The MLX format provides native Apple Silicon acceleration with no external dependencies. Load a model, start inferencing.
Get Zen4
Zen4 models are available now:
- HuggingFace: huggingface.co/zenlm -- all models, all formats, all variants
- Zen LM: zenlm.org -- documentation, benchmarks, and guides
- Hanzo Desktop: Zen4 models are integrated directly into the Hanzo Desktop app for one-click local inference
Built by Zen LM and Hanzo AI
Zen4 is the product of Zen LM and Hanzo AI. We build open foundation models because open weights accelerate the entire field. The best models should be available to everyone -- researchers, engineers, startups, and enterprises alike.
Hanzo AI is Techstars '17. We have been building AI infrastructure since before it was fashionable.
Download Zen4 today. Build something remarkable.
Read more
Zen4: Unbiased AI Models for Every Scale
Announcing the full Zen4 family: mini (4B) through ultra (1T MoE), all unbiased. Eight models covering every scale from edge to cloud — neutral, unconstrained, and built for agents, infrastructure, and the open internet.
zen4-ultra: A Trillion-Parameter AI Model, Open and Free
zen4-ultra brings Kimi K2.5 — a 1.04 trillion-parameter Mixture of Experts model — to the Zen AI family. Open weights, 256K context, 71% SWE-bench. Available now on HuggingFace.
One API for Every AI Model: Introducing the Hanzo AI Gateway
Hanzo AI launches the industry's first zero-markup multi-provider AI gateway — one API key for 100+ models from every major provider, plus 14 proprietary Zen models.