Zen Reranker is a neural reranking model built for retrieval-augmented generation pipelines. 7680-dimensional embeddings, 31.87x compression via BitDelta, and native integration with the DSO protocol for decentralized semantic search.
The Retrieval Gap
First-stage retrieval -- vector search over dense embeddings -- is fast but imprecise. It retrieves by approximate similarity, which means it surfaces documents that are topically related but may not actually answer the query. Rerankers fix this by applying a more expensive but more accurate scoring model to the candidate set before it reaches the LLM.
The tradeoff is latency and storage. Rerankers that produce useful signal typically require large embeddings and expensive cross-attention computation. Zen Reranker solves the storage side with BitDelta compression and the compute side by making the 7680-dimensional embeddings the full signal rather than requiring cross-encoder scoring.
7680-Dimensional Embeddings
Most production embedding models produce 768 or 1024-dimensional vectors. Zen Reranker uses 7680 dimensions -- 10x more granular -- which enables finer discrimination between candidates that look similar at lower resolution.
The practical effect is measurable improvement at the top of the ranking where it matters: the documents placed in positions 1-5 are more likely to contain the answer. For RAG systems where the LLM sees only the top-k results, ranking quality at k is the most important metric.
| Embedding Model | Dimensions | NDCG@10 (BEIR avg) |
|---|---|---|
| Standard 768-dim | 768 | 49.2 |
| Standard 1024-dim | 1024 | 51.8 |
| Zen Reranker | 7680 | 58.4 |
BitDelta Compression: 31.87x
7680 dimensions at full precision would require 30 KB of storage per document in a vector database. BitDelta reduces this by learning a 1-bit delta from a base embedding matrix, achieving 31.87x compression while retaining the ranking signal.
At 31.87x compression, 1 million documents stored as Zen Reranker embeddings require approximately 940 MB -- comparable to a 768-dimensional uncompressed index. You get the ranking quality of 7680 dimensions at the storage cost of 768.
This makes Zen Reranker practical for large document corpora where high-dimensional vectors would be prohibitively expensive to store and serve.
RAG Pipeline Integration
Drop-in replacement for any cross-encoder reranker:
from hanzo import ZenReranker
reranker = ZenReranker()
# First-stage: fast approximate retrieval
candidates = vector_store.search(query, top_k=100)
# Rerank: precise scoring on candidates
results = reranker.rerank(query, candidates, top_n=5)
# LLM sees only the top 5, precisely ranked
context = [r.text for r in results]Compatible with LlamaIndex and LangChain reranker interfaces. Hanzo Cloud exposes it at api.hanzo.ai/v1/rerank.
DSO Integration
Zen Reranker integrates with the DSO (Decentralized Semantic Optimization) protocol, ZIP-001 from Zoo Labs Foundation. DSO is a decentralized search network where nodes contribute retrieval capacity and are rewarded for quality signal.
Zen Reranker embeddings are the native vector format for DSO nodes in the Hanzo and Zoo networks. If you are building on DSO or running a Zoo search node, Zen Reranker is the reranking layer.
Get Zen Reranker
- HuggingFace: huggingface.co/zenlm
- Hanzo Cloud API:
api.hanzo.ai/v1/rerank - Python:
pip install hanzo--from hanzo import ZenReranker - Zen LM: zenlm.org -- RAG integration guides
Zach Kelling is the founder of Hanzo AI, Techstars '17.
Read more
Zen Search: A Model Built for Retrieval-Augmented Generation
Zen Search is optimized for RAG pipelines: low hallucination rates, citation-grounded answers, and training specifically for working with retrieved context rather than relying on memorized knowledge.
Zen Embedding: #1 Multilingual Retrieval
Zen Embedding reaches #1 on the MTEB multilingual leaderboard with a 70.58 score, supporting 100+ languages and Matryoshka representation learning across four dimension sizes.
Zen Translator: High-Quality Machine Translation for 100+ Languages
Zen Translator is a specialized translation model covering 100+ languages with document-level coherence, domain adaptation, and tone-preserving translation for production localization pipelines.