March 30, 06:42 PM
There's one problem I'm trying to find a solution for - So I'll place the problem online to see if there are any smart minds lurking who can offer a solution.
Imagine a Youtube-style recommendations feature. That is: given the last x number of episodes a user has liked, return (for simplicity let's say 12) other episodes based from users who also liked the same x number of episodes.
Assume we can traverse the data as: user --> *like --> episode Where each user can have many likes and each like has only one episode
I'm not necessarily looking for free advice here. I'm mostly just trying to fascilitate a discussion with individuals who also enjoy doing this kind of stuff.
Thanks! And Happy Hacking!
March 31, 04:23 AM
There are multiple approaches you can take, depending on the grade of complexity you would like to go. Now, keep in mind that no recommendation system is perfect (for many reasons), but I can think of a few with varying degree of complexity and success. I will assume X is 10, just so that I can use numbers and make things easier to understand.
You already have the last 10 videos the user watched. You take the history (last 12 in this case) of all users that had 3 (or 5, or 7... you can experiment to see which gives best results) video from the 10 your user watched. This way you have the history of the users that watched a sub-set of video as the user you want to recommend.
Next, from the list of sets of videos, you arrange all the videos descending by number of appearances and discard the videos that were in the initial set of 10.
You will end up with a list of video that other people watched that also watched a few from the history of your user. There are a few downsides:
Just using the history of other users is not enough. What if you could include other information? What about keywords, title, description? I know this can cause problems because people can (and will) mislabel videos, but with enough data you can (partially) rule the bad ones out.
What you could do is make a tag-cloud of the videos in the history of the user. You include the tile, tags and maybe some keywords from the description. Obviously, you remove common words (in, the, if, where, etc.). Maybe even use different ratings for elements (words in the description to be worth less than words in the tags and in the title).
Once you have this, you do a list of videos that have the same tags in their cloud. You arrange them by the number of common items, again remove the ones the user already saw, and offer the top 12 as suggestions.
This has the advantage that the tag clouds can be calculated when the video is uploaded and does not require calculations during the recommendation process.
You could use a mix of the two as well. You get the tag cloud only for common videos, but only for 8 or 9 of the 12 recommended and include 3-4 that use only the tag cloud. This way you can try to avoid creating a "bubble" where the user is only recommended videos that are strictly based on other users with similar interests.
Keep in mind though that no system is perfect. You will have to experiment and tweak certain values to make the system better over time.
April 9, 08:46 PM
Thank you for the insight, ppopescu!