Collaborative Filtering: Recommendation Engine with Excel
Have you ever been creeped out by ad banners popped up on Facebook? Ad banners that are so relevant and accurately address your current needs and wants. That is the magic of collaborative filtering.
Collaborative filtering is a method of making automatic predictions about the preference of a consumer by collecting preferential information from various users. The underlying assumption of this approach is that if consumer A shares the same opinion as consumer B on an issue, A is more likely to share the opinion of B on a different issue than other randomly chosen consumer.
Recommendation engines are built using collaborative filtering techniques, making the lives of consumer easier meanwhile profiting businesses; thus, I call it the win-win engine.
Today, I’ll practice on user-based collaborative filtering using a made up data set. Seven people (Luffy, Sanji, Nami, Zoro, Franky, Robin, Usopp) have each rated on 1-5 scale a subset of six of my favorite animes (One Piece, Dragon Ball, Naruto, Pokemon, and Haikyuu). Suppose we would like to predict Robin’s rating for the epic anime Naruto, we will have to generate a reasonable member-based forecast.
Step 1: User similarity.
There are several ways to evaluate similarity of user ratings. We can define the similarity between two people equal to the correlation between their ratings. First, we randomly pick out two people, Luffy and Robin in this case. Use the formula =INDEX($D$8:$I$14,MATCH($C17,$C$8:$C$14,0),D$15) to place the Luffy’s and Robin’s ratings right beneath our table, or in any empty cells.
Before we measure their correlation, note that Excel will use the 0s (corresponding to unseen animes) in its calculation. This is a mistake I had made before, and we don’t want that. Therefore, use the formula =IF(COUNTIF(D$16:D$17,">0")=2,D17,"_ ") to create a new row of ratings omitting the unseen animes.
Now we can go ahead to calculate the correlation using the formula =CORREL(D18:I18,D19:I19). The correlation of 0.9912 indicates that Luffy and Robin have similar tastes in animes. Use two-way data table to compute the correlation for rest of each pair of peoples’ tastes in anime. To use the two-way table, re-enter the correlation formula on the top left of your table. List all of the people’s name in order as shown below.
Select “Data table” from the What-if portion of the Data tools Group and select the two very first correlation cells as row input and column inputs. This allows Excel to loop through all the pairs of anime viewers and yields correlation.
Step 2: Rating Estimation
To estimate Robin’s rating for the best ninja anime ever—Naruto, we should start with her average rating of all anime and compare them with the results of other people. To increase the estimate of Robin’s rating for Naruto, we look at those people: People who have a positive similarity to Robin and like Naruto more than their average anime. Or, people who have a negative similarity to Robin and like Naruto less than their average anime.
On the other hand, to decrease the estimate of Robin’s rating for Naruto, we look at those people: People who have a positive similarity to Robin and like Naruto less than their average anime. Or, people who have a negative similarity to Robin and like Naruto more than their average movie.
Now, the execution. We use the formula =AVERAGE(D8:I8) in cell J9:J14 to compute the average rating for each person. In H34 and I34, choose the anime-person combination for which we want to estimate a rating. Use the formula =VLOOKUP(N35,$C$8:$J$14,8,FALSE) in cell O35:O40 to copy each person’s average rating.
Now, to pull the similarity of each person to the selected person. We first use one MATCH function to ensure we pull the correlations for Robin. We then use a second MATCH function to ensure that we pull the similarity of Robin to each other person. The formula is as follow: =INDEX(correlations,MATCH($I$34,$H$24:$H$30,0),MATCH(N35,$I$23:$O$23,0))
Remember, we need to make sure if a person has not watched Naruto, we’ll enter a value of zero. Use the formula =INDEX(ratings,MATCH(N36,$N$34:$N$40,0),MATCH($H$34,$D$7:$I$7,0))in cell Q35:Q40.
Now we have to compute for each person who has seen Naruto an adjustment equal to the amount by which the person’s rating for Naruto exceeds their average rating. We use the following formula: =IF(AND(N35<>$I$34,Q35>0),(Q35-O35),0) in R35:R40.
To enter the absolute value of the correlation between Robin and each person who has seen Naruto, we use the formula: =IF(AND(N34<>$I$34,Q34>0),ABS(P34),0)
We’re almost done! Compute the total amount by which we can adjust Robin’s rating to obtain an estimate of her rating for Naruto. In O42, use the formula =SUMPRODUCT(R34:R40,P34:P40)/SUM(S34:S40)
FINALLY, to compute the estimate of Robin’s rating for Naruto, enter the formula =J34+O42, in which these two cells are the mean and the total adjustment. And wallah!
There we have Robin’s estimated rating for the anime Naruto.