2017 was the year Hanzo committed fully to AI-powered commerce. Here is what we learned building and deploying machine learning systems at scale.
Lesson 1: Data Quality Trumps Model Complexity
We spent months improving model architectures. Our biggest gains came from fixing data pipelines.
Issues we found:
- Duplicate events from retry logic
- Timezone inconsistencies in timestamps
- Missing data from mobile apps
- Label noise from unclear UI states
A simple model on clean data outperformed a complex model on messy data. Every time.
Lesson 2: Latency is a Feature
Recommendations that take 500ms do not get used. Predictions that arrive after decisions are made are worthless.
We rewrote our serving infrastructure three times this year:
- V1: Python + Flask (800ms p99)
- V2: Go + gRPC (200ms p99)
- V3: Edge caching + precomputation (50ms p99)
The performance investment unlocked new use cases. Real-time personalization became possible.
Lesson 3: Explainability Matters
"The algorithm recommended it" is not acceptable to merchants. They need to understand why.
We added explanation generation to all recommendations:
- "Frequently purchased together"
- "Popular in your customer segment"
- "Similar style and price range"
Trust increased. Adoption increased. And we caught model bugs through inconsistent explanations.
Lesson 4: Offline Metrics Lie
Models that excelled in offline evaluation failed in production. The gap between historical data and live traffic was larger than expected.
We moved to online experimentation for all model changes:
- A/B tests with statistical rigor
- Gradual rollouts with automatic rollback
- Business metrics as success criteria
Offline evaluation is for filtering obviously bad ideas. Production is the only truth.
Lesson 5: Simple Baselines are Strong
Before deploying any ML model, we compare against simple baselines:
- Popularity ranking
- Recency ranking
- Rule-based heuristics
Many "AI features" do not beat these baselines. We ship the simpler solution when ML does not provide meaningful lift.
Lesson 6: Operations is Half the Work
Training a model is straightforward. Operating it is hard:
- Retraining schedules
- Feature freshness
- Model versioning
- Drift detection
- Incident response
We built MLOps infrastructure that treats models as production services, not research artifacts.
Looking Ahead to 2018
Our 2018 roadmap:
- Deep learning for product understanding
- Reinforcement learning for optimization
- Federated learning for privacy
- AutoML for merchant-specific models
The fundamentals remain: clean data, fast inference, explainable results, rigorous evaluation.
Zach Kelling is the founder of Hanzo Industries.
Read more
Collaborative Filtering at Commerce Scale: v1.0 of the Recommendation Engine
The first version of the Hanzo recommendation engine used matrix factorization to find latent preference signals in purchase data. Here's how we built it.
Multi-Modal AI for Commerce
How we are combining vision, language, and structured data for next-generation commerce AI.
847 Features, 0.89 AUC: ML for Commerce Analytics
The ML Analytics paper, September 2018: 847-feature engineering pipeline, 0.89 AUC purchase prediction, 0.84 AUC churn prediction, modified k-means with behavioral embeddings, 2.3x marketing ROI, 50M events/day.