🔹 Introduction to Recommendation Systems
-
Definition: Systems that predict user responses to various items or content.
-
Examples:
-
News recommendation (e.g., articles tailored to reading habits).
-
Product recommendation (e.g., Amazon suggesting items based on past purchases).
-
-
Two Main Types:
-
Content-Based Systems: Focus on item properties (e.g., recommending cowboy movies if the user watches many in that genre).
-
Collaborative Filtering Systems: Focus on user-user or item-item similarities based on behavior or ratings.
-
🔹 9.1 A Model for Recommendation Systems
-
Utility Matrix:
-
Rows: Users
-
Columns: Items
-
Values: User ratings (e.g., 1-5 stars)
-
Most entries are unknown (sparse matrix)
-
-
Long Tail Effect:
-
Online systems can offer a vast range of items, including niche ones, unlike physical stores.
-
Example: “Touching the Void” gained popularity through recommendation after “Into Thin Air” was released.
-
-
Populating the Matrix:
-
Explicit ratings: Collected directly from users.
-
Implicit feedback: Derived from behavior (clicks, purchases, views).
-
🔹 9.2 Content-Based Recommendations
-
Item Profiles:
-
Attributes like genre, director, cast (for movies), or specs for products.
-
Profiles are used to match users to similar items they've liked.
-
-
Feature Extraction:
-
From structured metadata or textual data (e.g., using TF-IDF for documents).
-
Can use tags or user-generated annotations (e.g., del.icio.us, collaborative tagging games).
-
-
User Profiles:
-
Created based on the items a user has liked.
-
Often represented as vectors of feature frequencies.
-
-
Document & Image Features:
-
Use content analysis or tags to classify and recommend.
-
-
Classification Approach:
-
Use classifiers (e.g., decision trees) trained on a user’s past behavior to predict preferences.
-
🔹 9.3 Collaborative Filtering
-
Similarity Measures:
-
Compare user or item vectors using:
-
Jaccard similarity (binary data)
-
Cosine similarity (rating values)
-
-
-
Approaches:
-
User-User Similarity: Recommend items liked by similar users.
-
Item-Item Similarity: Recommend similar items to ones the user has liked.
-
-
Clustering:
-
Cluster users or items to reduce sparsity and improve similarity calculations.
-
Used in hierarchical or iterative steps to build aggregate user-item models.
-
🔹 9.4 Dimensionality Reduction: UV Decomposition
-
Matrix Factorization:
-
Decompose the utility matrix into matrices U and V, such that their product approximates the original matrix.
-
-
Root Mean Square Error (RMSE):
-
Used to measure prediction accuracy.
-
RMSE is minimized during matrix training.
-
-
Optimization Tricks:
-
Multiple initializations and combining models to escape local minima.
-
🔹 9.5 The Netflix Challenge
-
Overview:
-
$1M prize to improve Netflix’s CineMatch algorithm by 10%.
-
-
Dataset:
-
Included millions of user-movie ratings.
-
-
Winning Strategy:
-
Combined multiple algorithms.
-
Time of rating proved useful (some movies were appreciated only after a delay).
-
External metadata like IMDB genres provided minimal improvement due to ML capabilities and difficulty in entity resolution.
-
🔹 9.6 Summary Highlights
-
Key Concepts:
-
Utility Matrix: Central to recommendation modeling.
-
Content-Based & Collaborative Filtering: Core architectures.
-
Feature Engineering: Crucial for content-based methods.
-
Matrix Factorization & RMSE: Powerful for collaborative filtering.
-
Hybrid Models: Combine methods for better performance.
-
Clustering: Aids in overcoming sparsity in user-item interactions.
-
This breakdown gives you clear atomic concepts and relationships that can be structured into a knowledge graph, especially focusing on:
-
Entities: Users, Items, Features, Ratings, Clusters, Algorithms
-
Relations: likes, rates, similar_to, clustered_with, uses_algorithm, optimized_by, influenced_by (e.g., time, features)