Introducing Zen4: Open Foundation Models from 4B to 1T+

Zen4 is here. A complete lineup of open-weight AI models spanning edge devices to cloud-scale infrastructure, built by Zen LM and Hanzo AI, Techstars '17.

From a 4B model that runs on your phone to a trillion-parameter powerhouse for the most demanding workloads, Zen4 delivers frontier-class performance at every scale. Every model ships with open weights. No gates, no waitlists.

The Zen4 Consumer Line

The consumer line covers every deployment target from mobile to high-end workstations. All consumer models are available in both Instruct and Thinking variants.

Model	Parameters	Architecture	Base	Context	Target
Zen4 Mini	4B	Dense	Qwen3-4B	Standard	Edge and mobile devices
Zen4	8B	Dense	Qwen3-8B	Standard	Standard desktop inference
Zen4 Pro	14B	Dense	Qwen3-14B	Standard	Professional workloads
Zen4 Max	30B (3B active)	MoE	Qwen3-30B-A3B	256K	Flagship efficiency
Zen4 Max Pro	80B (3B active)	MoE	Qwen3-Next-80B-A3B	256K	The best consumer AI

Zen4 Mini fits comfortably on edge hardware and mobile SoCs. Zen4 and Zen4 Pro handle standard and professional workloads on commodity GPUs. The MoE models -- Zen4 Max and Zen4 Max Pro -- deliver outsized capability with only 3B active parameters per forward pass, making them remarkably efficient for their total parameter count. Both support 256K context windows.

The Zen4 Coder Line

Purpose-built for software engineering. The coder line is optimized for code generation, completion, refactoring, and agentic coding workflows.

Model	Parameters	Architecture	Base	Context	Target
Zen4 Coder Flash	31B (3B active)	MoE	GLM-4.7-Flash	131K	Fast code generation
Zen4 Coder	80B (3B active)	MoE	Qwen3-Coder-Next	256K	Flagship agentic coding

Zen4 Coder Flash is built for speed -- rapid completions, inline suggestions, and fast iteration loops. Zen4 Coder is the flagship: 256K context, full agentic coding support, and deep understanding of complex codebases. Both ship in Instruct and Thinking variants.

The Zen4 Ultra Line (Cloud)

For workloads that demand maximum capability, the Ultra line brings trillion-scale models to cloud deployments.

Model	Parameters	Architecture	Base	Status
Zen4 Ultra	1.04T (32B active)	MoE	Kimi K2.5 Thinking	Available now
Zen4 Ultra Max	TBA	TBA	DeepSeek V4	Coming soon

Zen4 Ultra activates 32B parameters from a 1.04 trillion parameter mixture-of-experts model. It represents the current ceiling of open-weight model performance. Zen4 Ultra Max, based on DeepSeek V4, is in development.

Instruct and Thinking Variants

Every Zen4 model ships in two variants:

Instruct -- Optimized for direct instruction following, chat, and task completion. Low latency, deterministic output.
Thinking -- Extended reasoning with chain-of-thought. Better performance on complex multi-step problems, math, and code analysis.

Choose Instruct for production APIs and interactive applications. Choose Thinking when accuracy on hard problems matters more than speed.

Available Formats

All Zen4 models are distributed in multiple formats to fit your deployment stack:

Format	Description	Use Case
SafeTensors	Native PyTorch-compatible weights	GPU inference, fine-tuning
GGUF Q4_K_M	4-bit quantized	CPU and edge deployment
GGUF Q5_K_M	5-bit quantized	Balanced quality and size
GGUF Q6_K	6-bit quantized	Higher quality, moderate size
GGUF Q8_0	8-bit quantized	Near-lossless, larger footprint
GGUF F16	16-bit float	Full precision GGUF
MLX	Apple MLX format	Native Apple Silicon acceleration

Runs Locally on Apple Silicon

Every consumer and coder model in the Zen4 lineup fits on a 64GB M-series Mac. The MoE architectures are particularly well-suited to unified memory -- with only 3B active parameters per forward pass, inference is fast and responsive even on laptop hardware.

The MLX format provides native Apple Silicon acceleration with no external dependencies. Load a model, start inferencing.

Get Zen4

Zen4 models are available now:

HuggingFace: huggingface.co/zenlm -- all models, all formats, all variants
Zen LM: zenlm.org -- documentation, benchmarks, and guides
Hanzo Desktop: Zen4 models are integrated directly into the Hanzo Desktop app for one-click local inference

Built by Zen LM and Hanzo AI

Zen4 is the product of Zen LM and Hanzo AI. We build open foundation models because open weights accelerate the entire field. The best models should be available to everyone -- researchers, engineers, startups, and enterprises alike.

Hanzo AI is Techstars '17. We have been building AI infrastructure since before it was fashionable.

Download Zen4 today. Build something remarkable.

Introducing Zen4: Open Foundation Models from 4B to 1T+

The Zen4 Consumer Line

The Zen4 Coder Line

The Zen4 Ultra Line (Cloud)

Instruct and Thinking Variants

Available Formats

Runs Locally on Apple Silicon

Get Zen4

Built by Zen LM and Hanzo AI

Read more

Zen4: Unbiased AI Models for Every Scale

zen4-ultra: A Trillion-Parameter AI Model, Open and Free

One API for Every AI Model: Introducing the Hanzo AI Gateway