zoo/ blog
Back to all articles
aimodelszentranslationmultilinguallaunch

Zen Translator: High-Quality Machine Translation for 100+ Languages

Zen Translator is a specialized translation model covering 100+ languages with document-level coherence, domain adaptation, and tone-preserving translation for production localization pipelines.

Today we are releasing Zen Translator, a model built specifically for high-quality machine translation across more than 100 languages.

General-purpose language models can translate, but they are not optimized for it. Zen Translator is trained from the ground up on translation data with explicit objectives for faithfulness, fluency, and domain consistency. It handles the things that trip up general models: maintaining consistent terminology across a document, preserving formatting and markup, adapting tone to the target language's conventions, and translating idiomatic expressions without literal rendering.

Coverage

TierLanguagesDescription
Tier 1EN, ZH, ES, FR, DE, JA, KO, PT, IT, RUBest quality, widest training data
Tier 2AR, HI, NL, PL, TR, VI, SV, DA, CS, ROStrong quality, extensive evaluation
Tier 390+ additional languagesGood quality, less evaluation coverage

Full language list at zenlm.org/translator.

Document-Level Coherence

Most MT models translate sentence by sentence, losing document context. Zen Translator processes multi-sentence context windows (up to 8,192 tokens) and maintains consistency across:

  • Named entities and proper nouns
  • Technical terminology and domain vocabulary
  • Pronoun reference chains (especially critical for languages with gendered pronouns)
  • Formal/informal register

A document translated with Zen Translator reads like it was written in the target language, not like a concatenation of independently translated sentences.

Domain Adaptation

Technical content has its own vocabulary. Zen Translator was trained with domain-specific terminology sets for:

  • Legal: Contract language, regulatory terms, legal procedure
  • Medical: Clinical terminology, drug names, diagnostic language
  • Technical: Software documentation, engineering specs, API references
  • Financial: Market terminology, accounting terms, regulatory filings
  • Marketing: Idioms, cultural adaptation, tone matching
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "zenlm/zen-translator",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-translator")

def translate(text: str, source: str, target: str, domain: str = "general") -> str:
    prompt = f"""Translate the following {source} text to {target}.
Domain: {domain}
Preserve all formatting, technical terms, and proper nouns.

Text:
{text}

Translation:"""

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=len(text) * 2, temperature=0.3)
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

# Technical documentation
result = translate(
    "The API rate limit is 1000 requests per minute per API key.",
    source="English", target="Japanese", domain="technical"
)

Benchmarks

DirectionZen TranslatorNLLB-3.3BDeepL API
EN → ZH28.4 BLEU25.128.8
EN → DE32.1 BLEU28.733.2
EN → JA24.7 BLEU21.325.1
ZH → EN26.3 BLEU23.827.4
ES → FR35.8 BLEU32.136.4
AR → EN22.1 BLEU18.923.8

BLEU scores on WMT 2024 test sets. Zen Translator matches commercial API quality across major language pairs while running fully on-premise.

Markup Preservation

Zen Translator handles HTML, Markdown, and XML without breaking tags:

html_text = """<h1>Getting Started</h1>
<p>Install the <code>package</code> using:</p>
<pre><code>npm install @hanzo/sdk</code></pre>"""

result = translate(html_text, "English", "French", domain="technical")
# Output preserves all HTML tags; only human-readable text is translated

Deployment

# Self-hosted via API
curl https://api.hanzo.ai/v1/translate \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen-translator",
    "text": "The system processes requests asynchronously.",
    "source": "en",
    "target": "de",
    "domain": "technical"
  }'

# Batch translation
curl https://api.hanzo.ai/v1/translate/batch \
  -d '{"texts": ["sentence 1", "sentence 2"], "source": "en", "target": "zh"}'

Specs

PropertyValue
Parameters7B
ArchitectureEncoder-decoder (seq2seq)
Source Languages100+
Target Languages100+
Max Input Tokens8,192
LicenseApache 2.0

Get Zen Translator


Zach Kelling is the founder of Hanzo AI, Techstars '17.