"We think you'd also like..." and the Math of Suggestion – Teil 1
I still buy vinyl records. There’s something nostalgic about friction between a needle and wax producing my music in the age of iTunes. When I buy records I don’t go to one of the Kreuzberg outlets near to where I live, I go across town to Rotation Records. Why? Because of Niko.
Niko owns Rotation and his knowledge of the the genres that he stocks make it worth taking 90 minutes out of my day to get his input. Niko doesn’t know who I am, but if I ask and tell him a little about the records I’ve bought in the past, he can get a pretty good idea of what I might be interested in. It’s in his interest to know his stuff – it makes his customers buy more records and keeps them coming back to Rotation.
Recommendations have no doubt been a component of sales since the advent of commerce. But we do live in the age of iTunes, the age of Amazon, where millions of products are stored in far away warehouses and shipped through complicated, automated logistics. So, who is the „Niko“ in the world of e-commerce?
Amazon was one of the pioneers in the use of automatic recommendations to drive sales. $5 billion, or 25% of the annual sales come from suggesting products to users by showing related books or personalized music recommendations. iTunes‘ „Genius“ suggests music downloads and Last.fm’s recommender system builds personalized radio streams and shows related concerts. Berlin is home to at least three startups focusing on different elements of recommendations: Plista’s web personalization, Mufin’s music analysis and my own startup, Directed Edge’s, pluggable recommendations offered as a web-service.
But aside from large e-commerce sites and startups that themselves are focusing on recommendations, how are recommendations relevant to the larger startup world? In a survey by ChoiceStream in 2007, 45% of users surveyed were found to be more likely to shop at a store that provided product recommendations, and for users that spent more than $1,000 online in the previous six months, the number went up to a full 69%. The same survey showed that 41% of users were more likely to pay attention to advertising that was personalized to their tastes. In social media and news sites recommendations reduce bounce rates by showing related content within the site.
So that’s the „why“, now what about the „how?“
Collaborative filtering is the best known branch of recommender systems. In non-computer scientist terms it just means looking at histories of ratings, purchases and clicks and figuring out which products or users are similar to each other based on that.
Let’s imagine that Bob and Sarah have both given ratings to „The Art of the Start“ and „Information Rules“.
If we want to figure out how similar Bob and Sarah’s tastes are, we can plot their ratings on a graph:
If we recall a little school geometry, we can figure out how far apart their tastes are. (You do remember the Pythagorean theorem, don’t you?)
In linear algebra, the branch of mathematics that dominates collaborative filtering, is called the „euclidean norm“ and is one of the simplest ways to measure how far apart two users‘ tastes are. If we were to add a third user, Josh, and computed his distance from Bob and Sarah, we could say, „Josh’s taste is closer to Sarah’s than to Bob’s“.
In practice, these systems aren’t computed for two or three users and two books, they’re computed for hundreds, thousands, or even millions of products and users. But let’s keep things simple: If we had measured the distance between Bob and every other user in an online store, we could then say, „Here are the 10 users closest to Bob.“ And from there we have a basis for delivering recommendations. There’s a pretty good chance that Bob will like some of the books that those other 10 people like him have purchased and that he doesn’t own yet.
But there’s a catch.
You see, every time a new book is added to the set of ratings above, a new dimension is added to the computations. Two books, two dimensions; three books, three dimensions; a million books, a million dimensions. And if there were 100,000 books and 100,000 users, measuring the distance between each user would take at least 15 billion mathematical operations. In computer science, that computation is said to scale „quadratically“. In practical terms, it means just throwing more hardware at the problem won’t make it scale. In part two of this article, I will present other, more recent strategies in recommendations, which are of a better feasibility when it comes to big data sets.
Über den Autor:
Scott Wheeler began research in graph-based similarity in 2004 as part of a series of talks on „Beyond Hierarchical Interfaces“ connected to his contributions as an Open Source developer. He co-founded Directed Edge in Berlin in 2008 with the goal of developing an easy to integrate recommendation engine for partner social media and e-commerce sites.