By 2013, the Hanzo A/B testing system had accumulated enough campaign data to ask a more interesting question: which combinations of creative elements — across all the campaigns we'd run — had actually worked, and why?
This was not a question you could answer with a dashboard. It required finding structure in high-dimensional data: thousands of campaigns, each with dozens of variable settings, each generating conversion, revenue, and engagement metrics.
We applied two approaches from the machine learning literature that became the foundation of the Earle genetic optimization system.
K-Means Clustering for Campaign Discovery
K-Means clustering groups similar objects — in our case, campaign configurations — into clusters based on feature similarity. Applied to ad combinations, it answers: "Which configurations are similar to each other, and do similar configurations perform similarly?"
We ran K-Means over a feature space that included:
- Creative element choices (headline variant, image type, color scheme)
- Audience segment targets
- Offer structure (discount depth, scarcity signal, social proof style)
- Timing and placement
The clusters that emerged were not the clusters a human would have designed. They were discovered from performance data. Configurations that performed well grouped together in ways that revealed non-obvious relationships — a specific headline style consistently co-occurred with a specific image type and a specific offer structure in the high-performing cluster.
Random Forest for Feature Importance
K-Means tells you what groups together. Random Forest tells you what matters.
We trained a Random Forest model to predict campaign performance from configuration features. Feature importance scores from the trained forest told us which variables had the most predictive power for conversion: "headline_type" ranked above "color_scheme" which ranked above "image_style" which ranked above "font_choice."
This had immediate practical value: spend optimization time on variables that move the needle. Stop wasting test traffic on variables that don't.
Combined with K-Means clustering, the picture was clear: a small number of configuration archetypes accounted for the majority of high-performing campaigns, and the key variables in those archetypes were identifiable.
Into Earle
This analysis — K-Means to find structure, Random Forest to find importance — became the theoretical foundation of the Earle genetic algorithm system we built in 2014.
Instead of exhaustively testing all combinations (intractable) or testing one variable at a time (slow and incomplete), Earle used the clustering and importance analysis to initialize genetic algorithm populations in the regions of configuration space most likely to contain high performers.
It converged faster. It discovered better combinations. It found the non-obvious configurations that human intuition would never generate.
The 2013 ML work made the 2014-2016 AI marketing capabilities possible.
Read more
Hanzo ML v2: The Year the Models Actually Got Good
In 2018, the Hanzo ML stack crossed a threshold — the models were better than what clients could build in-house, not just faster. The shift from 'convenient AI' to 'essential AI' changed our product positioning.
Collaborative Filtering at Commerce Scale: v1.0 of the Recommendation Engine
The first version of the Hanzo recommendation engine used matrix factorization to find latent preference signals in purchase data. Here's how we built it.
Generative AI Before the Name Existed: What We Were Building in 2015
In 2015 we were generating ad copy, product descriptions, and campaign narratives using language models. The field didn't have a name for what we were doing yet.