How to Build a Recommendation System from Scratch in 2025 (Even If You’re Not Netflix)
So you want Netflix-level recommendations? Same here.
Last weekend I tried to surprise my cousin with a movie pick. I failed. She rolled her eyes and said, “Just let the algorithm choose.” That stung. But it also got me thinking how hard could it be to build one myself?
Turns out: not that hard.
In the next ten minutes you’ll collect data, pick an algorithm, train a tiny model, and serve it with a cute API. No PhD required. Pinky promise.
Here’s the game plan we’ll follow:
- Part 1 - What a recommender actually is (spoiler: it’s just fancy match-making)
- Part 2 - Data: where to steal it and how to clean it
- Part 3 - Algorithms: collaborative vs content vs “why not both”
- Part 4 - Evaluation: does it work or does it really work?
- Part 5 - Ship it: from laptop to the cloud in 30 lines of code
Ready? Grab coffee. Let’s go.
1. What Even Is a Recommendation System?
Think of it like your best friend who knows you love weird sci-fi and Thai food.
A recommender just does that at scale. It looks at what people do (clicks, buys, binge-watches) and guesses what they’ll want next.
Three flavors exist:
Collaborative Filtering
“People similar to you liked this.”
Classic example: Amazon’s “Customers who bought this also bought…”
Needs zero item details only user behavior.
Content-Based Filtering
“You liked this sci-fi movie, here are more sci-fi movies.”
Uses item features like genre, director, ingredients, whatever.
Hybrid Models
Mix both. Netflix does this:
- collaborative = “other people’s queues”
- content-based = “it’s a dark comedy with Jason Bateman”
Pick one to start. You can always blend later.
2. Data Collection & Preprocessing (The Boring but Critical Bit)
Good news: you don’t need a warehouse of DVDs. Public datasets are everywhere.
Free Datasets to Steal Right Now
- MovieLens 1M - one million ratings, perfect for starters
- Amazon Reviews - product reviews across dozens of niches
- Spotify Million Playlist - songs and playlists (audio features included)
- Goodreads - books, genres, user shelves
Cleaning Checklist (Copy-Paste Ready)
- Drop duplicates - nobody wants to see “The Matrix” 14 times
- Handle missing ratings - simple mean fill works for small sets
- Normalize - map 1-5 stars to 0-1 for math happiness
- Encode categories - one-hot genres or use embeddings later
Here’s a 10-line pandas snippet that does the basics:
import pandas as pd
df = pd.read_csv('ratings.csv')
df.drop_duplicates(inplace=True)
df['rating'] = df['rating'] / 5.0 # 0-1 scale
movies = pd.read_csv('movies.csv')
df = df.merge(movies[['movieId', 'genres']], on='movieId')
See? Not scary.
3. Picking the Right Algorithm (With Code You Can Run Today)
Let’s build three mini-models in under 50 lines each. Pick whichever feels fun.
3.1 Collaborative Filtering with Surprise (SVD)
Install once:
pip install scikit-surprise
Code:
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
data = Dataset.load_from_df(df[['userId', 'movieId', 'rating']], Reader(rating_scale=(0,1)))
train, test = train_test_split(data, test_size=0.2)
model = SVD(n_factors=50, n_epochs=20, lr_all=0.005, reg_all=0.02)
model.fit(train)
from surprise import accuracy
preds = model.test(test)
print("RMSE:", accuracy.rmse(preds))
Tweak n_factors
like seasoning more isn’t always better.
3.2 Content-Based Filtering Using Plot Summaries
Got movie overviews? Turn them into vectors.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(movies['overview'].fillna(''))
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
def get_recs(title, top_n=5):
idx = movies[movies['title'] == title].index[0]
sim_scores = list(enumerate(cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]
return movies['title'].iloc[[i[0] for i in sim_scores]]
print(get_recs("Toy Story"))
Boom five movies with similar plots.
3.3 Hybrid: LightFM (Mix Both Worlds)
pip install lightfm
LightFM handles both user-item interactions and item features.
from lightfm import LightFM
from lightfm.data import Dataset as LDataset
ld = LDataset()
ld.fit(users=df['userId'].unique(), items=df['movieId'].unique())
interactions, weights = ld.build_interactions(df[['userId', 'movieId', 'rating']].values)
model = LightFM(loss='warp')
model.fit(interactions, epochs=30, num_threads=2)
Predict top-N for any user in two lines. Neat, huh?
4. Does It Actually Work? Three Ways to Check
Offline Metrics (Quick & Dirty)
- RMSE - lower is better for ratings
- Precision@K - out of top-5, how many were hits?
- MAP@K - mean average precision across all users
Online Test (The Real Judge)
Spin up an A/B test on your site:
50% see old “most popular” list, 50% see your shiny new recs.
Track click-through and watch the magic (or the meltdown).
Sanity Checklist Before You Celebrate
- Cold-start users: new folks with no history.
- Popularity bias: are you just pushing blockbusters?
- Diversity: does the list feel fresh or same-y?
5. Ship It: From Notebook to the World
You trained it. Now let people poke it.
Option A: Flask Micro-API (5 minutes)
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/rec/<int:user_id>')
def recommend(user_id):
scores = model.predict(user_id, np.arange(n_items))
top = np.argsort(-scores)[:10]
return jsonify(top.tolist())
app.run()
Run on localhost, then expose with ngrok for quick demos.
Option B: FastAPI + Docker (Production-ish)
- 30% faster than Flask
- Auto docs at
/docs
- Containerize and push to Fly.io or Render for free hosting
Option C: Serverless (AWS Lambda + API Gateway)
Pay only when people ask for recs.
Package your model with AWS SAM or Serverless Framework.
Cold starts hurt, so keep the model under 100 MB.
Real-World Pitfalls (So You Don’t Cry Later)
- Data sparsity - most users rate almost nothing. Use implicit feedback (views, clicks).
- Shifting tastes - retrain weekly, not yearly.
- Privacy regs - GDPR says “ask before you stalk.” Anonymize user IDs.
- Latency - matrix factorization can be slow. Cache top-N offline and refresh nightly.
Quick FAQ (Because I Know You’re Wondering)
Q: Do I need GPUs?
A: Not for 1 million ratings. A laptop is fine.
Q: What about deep learning?
A: Start simple. Neural nets are the cherry, not the cake.
Q: How much data is “enough”?
A: Rule of thumb: 10× more interactions than users + items combined.
What’s Next?
- Add side info: mood tags, price filters, time of day.
- Try implicit feedback with Bayesian Personalized Ranking.
- Experiment with Graph Neural Networks once you hit 10 million users.
- Read “Recommender Systems Handbook” for bedtime thrills.
“The best recommendation is the one that feels like your own idea.”
#recommendationsystems #machinelearning #python #datascience #personalization