zoo/ blog
Back to all articles
aimodelszenrerankingragsearchembeddingslaunch

Zen Reranker: Neural Reranking at 30x Compression

Zen Reranker delivers 7680-dimensional neural reranking with a 31.87x BitDelta compression ratio, designed for RAG pipelines and integrated with Hanzo and Zoo decentralized search networks.

Zen Reranker is a neural reranking model built for retrieval-augmented generation pipelines. 7680-dimensional embeddings, 31.87x compression via BitDelta, and native integration with the DSO protocol for decentralized semantic search.

The Retrieval Gap

First-stage retrieval -- vector search over dense embeddings -- is fast but imprecise. It retrieves by approximate similarity, which means it surfaces documents that are topically related but may not actually answer the query. Rerankers fix this by applying a more expensive but more accurate scoring model to the candidate set before it reaches the LLM.

The tradeoff is latency and storage. Rerankers that produce useful signal typically require large embeddings and expensive cross-attention computation. Zen Reranker solves the storage side with BitDelta compression and the compute side by making the 7680-dimensional embeddings the full signal rather than requiring cross-encoder scoring.

7680-Dimensional Embeddings

Most production embedding models produce 768 or 1024-dimensional vectors. Zen Reranker uses 7680 dimensions -- 10x more granular -- which enables finer discrimination between candidates that look similar at lower resolution.

The practical effect is measurable improvement at the top of the ranking where it matters: the documents placed in positions 1-5 are more likely to contain the answer. For RAG systems where the LLM sees only the top-k results, ranking quality at k is the most important metric.

Embedding ModelDimensionsNDCG@10 (BEIR avg)
Standard 768-dim76849.2
Standard 1024-dim102451.8
Zen Reranker768058.4

BitDelta Compression: 31.87x

7680 dimensions at full precision would require 30 KB of storage per document in a vector database. BitDelta reduces this by learning a 1-bit delta from a base embedding matrix, achieving 31.87x compression while retaining the ranking signal.

At 31.87x compression, 1 million documents stored as Zen Reranker embeddings require approximately 940 MB -- comparable to a 768-dimensional uncompressed index. You get the ranking quality of 7680 dimensions at the storage cost of 768.

This makes Zen Reranker practical for large document corpora where high-dimensional vectors would be prohibitively expensive to store and serve.

RAG Pipeline Integration

Drop-in replacement for any cross-encoder reranker:

from hanzo import ZenReranker

reranker = ZenReranker()

# First-stage: fast approximate retrieval
candidates = vector_store.search(query, top_k=100)

# Rerank: precise scoring on candidates
results = reranker.rerank(query, candidates, top_n=5)

# LLM sees only the top 5, precisely ranked
context = [r.text for r in results]

Compatible with LlamaIndex and LangChain reranker interfaces. Hanzo Cloud exposes it at api.hanzo.ai/v1/rerank.

DSO Integration

Zen Reranker integrates with the DSO (Decentralized Semantic Optimization) protocol, ZIP-001 from Zoo Labs Foundation. DSO is a decentralized search network where nodes contribute retrieval capacity and are rewarded for quality signal.

Zen Reranker embeddings are the native vector format for DSO nodes in the Hanzo and Zoo networks. If you are building on DSO or running a Zoo search node, Zen Reranker is the reranking layer.

Get Zen Reranker

  • HuggingFace: huggingface.co/zenlm
  • Hanzo Cloud API: api.hanzo.ai/v1/rerank
  • Python: pip install hanzo -- from hanzo import ZenReranker
  • Zen LM: zenlm.org -- RAG integration guides

Zach Kelling is the founder of Hanzo AI, Techstars '17.