Most AI platforms force a choice: use their models or bring your own. We eliminated that choice.
Today we're launching the Hanzo AI Gateway — a single OpenAI-compatible endpoint that gives developers access to over 100 AI models from every major provider, plus 14 proprietary Zen models built on our Mixture of Distilled Experts (MoDE) architecture. One API key. One bill. Zero markup on third-party models.
What Makes This Different
Every competing "unified API" adds a margin on top of provider pricing. We don't. When you call Claude Opus 4.6, GPT-5, Gemini 2.5 Pro, DeepSeek R1, or Llama 4 through Hanzo, you pay the same per-token rate you'd pay directly — but you get unified billing, automatic failover, request routing, and a single integration to maintain.
Featured third-party models available at launch:
| Provider | Models |
|---|---|
| Anthropic | Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5 |
| OpenAI | GPT-5, GPT-5 Mini |
| Gemini 2.5 Pro | |
| DeepSeek | DeepSeek R1, DeepSeek Chat |
| Meta | Llama 4 Maverick |
| Mistral | Mistral Large |
| Cohere | Command A |
| xAI | Grok 3 |
All models are accessible through a standard /v1/chat/completions endpoint. If your code works with OpenAI's SDK, it works with Hanzo. Change the base URL and you're done.
The Zen4 Model Family
Alongside third-party access, the gateway serves our proprietary Zen4 models:
- Zen4 Flagship — 744B parameters (40B active), MoE architecture, 202K context window
- Zen4 Ultra — Extended chain-of-thought reasoning on MoE, 262K context
- Zen4 Max — 1M token context window for processing entire codebases or document corpora
- Zen4 Coder — 480B-parameter code specialist with 163K context
- Zen4 Mini — Free-tier model for prototyping at zero cost
- Zen3 Omni — Multimodal (text + vision + audio) in a single call
Every Zen model is priced from our own infrastructure — no middleman, no markup.
Beyond Text: Full Multimodal Stack
The gateway isn't limited to chat completions:
| Capability | Endpoint | Starting Price |
|---|---|---|
| Embeddings | /v1/embeddings | Token-based |
| Reranking | /v1/rerank | Token-based |
| Image generation | /v1/images/generations | $0.04/image |
| Speech-to-text | /v1/audio/transcriptions | $0.0009/minute |
| Content safety | /v1/chat/completions (zen3-guard) | Token-based |
Our speech-to-text pricing deserves special attention: Zen3 Audio Fast transcribes audio at $0.0009 per minute — that's 85% cheaper than OpenAI's Whisper API at $0.006/minute. For a startup transcribing 10,000 hours of audio monthly, that's $540 vs $3,600.
Integrated Tools
The gateway also provides built-in tool capabilities for agent workflows:
| Tool | Price |
|---|---|
| Web Search | $0.005/query |
| Code Interpreter | $0.03/session-minute |
| File Storage | $0.20/GB/month |
| Text-to-Speech | $15.00/1M characters |
Getting Started
curl https://llm.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zen4",
"messages": [{"role": "user", "content": "Hello"}]
}'Sign up at hanzo.ai and get a $5 credit to start. No credit card required for free-tier models.
The gateway is live now. One API. Every model. No markup.
Read more
Zen4 Coder: 480B Parameters, 92% Fewer Active — The MoE Code Model
Hanzo AI launches Zen4 Coder, a 480B-parameter Mixture of Experts code model that activates only 35B parameters per token — delivering frontier code intelligence at a fraction of the compute cost.
Zen4: Unbiased AI Models for Every Scale
Announcing the full Zen4 family: mini (4B) through ultra (1T MoE), all unbiased. Eight models covering every scale from edge to cloud — neutral, unconstrained, and built for agents, infrastructure, and the open internet.
Introducing Zen4: Open Foundation Models from 4B to 1T+
Zen4 is a complete lineup of open AI models spanning from 4B to over 1 trillion parameters, featuring consumer, coder, and ultra tiers.