Today we are releasing Zen Translator, a model built specifically for high-quality machine translation across more than 100 languages.
General-purpose language models can translate, but they are not optimized for it. Zen Translator is trained from the ground up on translation data with explicit objectives for faithfulness, fluency, and domain consistency. It handles the things that trip up general models: maintaining consistent terminology across a document, preserving formatting and markup, adapting tone to the target language's conventions, and translating idiomatic expressions without literal rendering.
Coverage
| Tier | Languages | Description |
|---|---|---|
| Tier 1 | EN, ZH, ES, FR, DE, JA, KO, PT, IT, RU | Best quality, widest training data |
| Tier 2 | AR, HI, NL, PL, TR, VI, SV, DA, CS, RO | Strong quality, extensive evaluation |
| Tier 3 | 90+ additional languages | Good quality, less evaluation coverage |
Full language list at zenlm.org/translator.
Document-Level Coherence
Most MT models translate sentence by sentence, losing document context. Zen Translator processes multi-sentence context windows (up to 8,192 tokens) and maintains consistency across:
- Named entities and proper nouns
- Technical terminology and domain vocabulary
- Pronoun reference chains (especially critical for languages with gendered pronouns)
- Formal/informal register
A document translated with Zen Translator reads like it was written in the target language, not like a concatenation of independently translated sentences.
Domain Adaptation
Technical content has its own vocabulary. Zen Translator was trained with domain-specific terminology sets for:
- Legal: Contract language, regulatory terms, legal procedure
- Medical: Clinical terminology, drug names, diagnostic language
- Technical: Software documentation, engineering specs, API references
- Financial: Market terminology, accounting terms, regulatory filings
- Marketing: Idioms, cultural adaptation, tone matching
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"zenlm/zen-translator",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-translator")
def translate(text: str, source: str, target: str, domain: str = "general") -> str:
prompt = f"""Translate the following {source} text to {target}.
Domain: {domain}
Preserve all formatting, technical terms, and proper nouns.
Text:
{text}
Translation:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=len(text) * 2, temperature=0.3)
return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
# Technical documentation
result = translate(
"The API rate limit is 1000 requests per minute per API key.",
source="English", target="Japanese", domain="technical"
)Benchmarks
| Direction | Zen Translator | NLLB-3.3B | DeepL API |
|---|---|---|---|
| EN → ZH | 28.4 BLEU | 25.1 | 28.8 |
| EN → DE | 32.1 BLEU | 28.7 | 33.2 |
| EN → JA | 24.7 BLEU | 21.3 | 25.1 |
| ZH → EN | 26.3 BLEU | 23.8 | 27.4 |
| ES → FR | 35.8 BLEU | 32.1 | 36.4 |
| AR → EN | 22.1 BLEU | 18.9 | 23.8 |
BLEU scores on WMT 2024 test sets. Zen Translator matches commercial API quality across major language pairs while running fully on-premise.
Markup Preservation
Zen Translator handles HTML, Markdown, and XML without breaking tags:
html_text = """<h1>Getting Started</h1>
<p>Install the <code>package</code> using:</p>
<pre><code>npm install @hanzo/sdk</code></pre>"""
result = translate(html_text, "English", "French", domain="technical")
# Output preserves all HTML tags; only human-readable text is translatedDeployment
# Self-hosted via API
curl https://api.hanzo.ai/v1/translate \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zen-translator",
"text": "The system processes requests asynchronously.",
"source": "en",
"target": "de",
"domain": "technical"
}'
# Batch translation
curl https://api.hanzo.ai/v1/translate/batch \
-d '{"texts": ["sentence 1", "sentence 2"], "source": "en", "target": "zh"}'Specs
| Property | Value |
|---|---|
| Parameters | 7B |
| Architecture | Encoder-decoder (seq2seq) |
| Source Languages | 100+ |
| Target Languages | 100+ |
| Max Input Tokens | 8,192 |
| License | Apache 2.0 |
Get Zen Translator
- HuggingFace: huggingface.co/zenlm/zen-translator
- Hanzo Cloud API:
zen-translatormodel atapi.hanzo.ai/v1/translate - Zen LM: zenlm.org -- localization pipeline guides
Zach Kelling is the founder of Hanzo AI, Techstars '17.
Read more
Zen Scribe: Professional Content Writing at 4B
Zen Scribe is a 4B parameter model fine-tuned for long-form content generation — blog posts, technical documentation, business writing, and structured content pipelines — with consistent voice across extended outputs.
Zen Audit: Code Security and Smart Contract Analysis
Zen Audit is trained on CVE databases, audit reports, and vulnerability research to provide automated code security analysis and smart contract auditing with low false positive rates.
Zen Search: A Model Built for Retrieval-Augmented Generation
Zen Search is optimized for RAG pipelines: low hallucination rates, citation-grounded answers, and training specifically for working with retrieved context rather than relying on memorized knowledge.