"We think you'd also like..." and the Math of Suggestion - Teil 2
The recommendation scheme I laid out in part one of this article is what’s known as „user-based collaborative filtering“, meaning that recommendations are computed by finding similar users and then recommending products that those users liked.
Because of the computational limitations of user-based collaborative filtering, around 2001 there was a shift to what’s known as „item-based collaborative filtering“. In item based collaborative filtering, instead of first finding related users and then using those users to find related products or content, item-based strategies first compute the similarity of every item and use that to recommend products.
Let’s take the ratings above, but add a third book, „Blue Ocean Strategy“ and assume that Bob gave it a rating of 5 and Sarah a rating of 7.
Now, instead of seeing how similar Bob and Sarah are, we see how similar each product is to the others. To deliver recommendations for Bob we can then see which product he’s liked and jump to products which are similar to those.
Item-based collaborative filtering works well when the set of items, or products, is fairly static and smaller than the number of users, as is typical in an online store. The advantage of item-based strategies is that we can quickly compute recommendations for Bob without having to first compare him to every other user of the store. If Bob has rated 100 items, we can look at his top 10 highest rated items, find the items related to those that he hasn’t purchased and rank them based on how similar they are to his top 10.
To do this, however, item-based collaborative filtering requires a pre-processing step where the similarity of every product is first computed. Even some of the big names in collaborative filtering – Amazon, Last.fm, for instance – until recently didn’t update their recommendations more often than every few days or once a week because computing the item similarity matrix takes so long.
The complexity of this computation is the same as creating user-based recommendations; it’s the same 15 billion mathematical operations. One of the most active areas of recommendations research, and one of the defining characteristics of commercial recommender systems, is mitigating this, „curse of dimensionality“, as some papers call it, a problem common to collaborative filtering, image processing and natural language processing.
In computer science lingo, a graph is just a set of connections between things. Those connections are called „edges“. A „directed edge“ is a link from one item to another, like a link from one web page to another, and the term my company’s name comes from. One of the most well known applications of graph theory is used in Google’s PageRank, since it ranks the importance of pages based on who links to them and what they link to.
Let’s imagine a small set of friends in a social network:
Bob is a friend of Sarah, and Sarah is a friend of Amy, who is a friend of Josh. Peter is friends with both Amy and Josh. Intuitively, we can guess that Peter and Sarah might get along well since they share some common friends, even though there’s no explicit connection between them.
Graph-based recommendation systems are a less widely known subfield of recommender systems, but they deliver recommendations based on the connections between people or products. For the mathematically inclined, it can be show that you in fact can model any traditional collaborative filtering system within a graph. For instance, if we go back to our list of ratings, we can imagine a connection between Bob and „The Art of the Start“. Graph connections, or „edges“ can also have weights – meaning, they tell us how strong the connection between those two items is. So Bob has a connection to „The Art of the Start“ with a weight of 8, and a connection to „Information Rules“ with a weight of 7.
By looking at how strongly Bob is connected to the products in his „neighborhood“ of the connection structure, just like we could guess that Peter and Sarah might get along well, we can also guess which products Bob is likely to be interested in. In some cases, graphs provide a way to find more obscure links from people to products because there can be a path from Bob through Peter to, say, „Blue Ocean Strategy“.
When we want to see how similar products are in item-based collaborative filtering, we first have to compare every product to every other product, as noted above. In a graph, we can accomplish the same by stepping back one step in the connection structure, to every user that has bought „Information Rules“ and then step forward from each of those users to get a full set of „co-purchased“ items. Rather than having to iterate over every product, we can quickly build the set of potential matches and often deliver recommendations in real-time without a batch processing step.
So, those are the foundations of our electronic „Niko“ in the digital age. For a more technical introduction to recommender systems, check out O’Reilly’s Programming Collective Intelligence.
Über den Autor:
Scott Wheeler began research in graph-based similarity in 2004 as part of a series of talks on „Beyond Hierarchical Interfaces“ connected to his contributions as an Open Source developer. He co-founded Directed Edge in Berlin in 2008 with the goal of developing an easy to integrate recommendation engine for partner social media and e-commerce sites.