My Learning Hub

Machine Learning (core)

Become a from-scratch-fluent ML practitioner: implement the classical algorithms, backprop, and a transformer by hand, win at real data on Kaggle, and reproduce a published paper from its equations — understanding every line, not just calling .fit().

Most "ML engineers" are API operators who panic the moment a model misbehaves, because they never built the machinery.

The roadmap

Stage 1 — Classical ML from scratch, then with scikit-learn · 8-12 weeks weeks
Implement the core supervised and unsupervised algorithms in pure NumPy until you can derive their loss functions and update rules from memory, then re-do them in scikit-learn and learn to trust the bias-variance / regularization story in your bones.

Concepts, resources and problems

Concepts — Linear regression: the normal equation, gradient descent, MSE loss derived from maximum likelihood under Gaussian noise · Logistic regression: sigmoid, cross-entropy loss, the log-odds interpretation, why you can't use MSE · Regularization: L1 (Lasso) vs L2 (Ridge), what they geometrically do to weights, elastic net · The bias-variance tradeoff made concrete: learning curves, underfitting vs overfitting, why more data sometimes doesn't help · k-Nearest Neighbors and the curse of dimensionality · Decision trees: information gain / Gini impurity, how a split is chosen, why trees overfit · Ensembles: bagging, random forests, gradient boosting (the intuition behind XGBoost, and why boosting fits residuals) · Support Vector Machines: the margin, the kernel trick, soft margins, the dual formulation · Unsupervised: k-Means (and why it's just EM in disguise), PCA from the covariance matrix / SVD · Cross-validation, train/val/test discipline, data leakage, and proper evaluation metrics (precision/recall/F1, ROC-AUC, confusion matrices)

ReadAn Introduction to Statistical Learning (ISLP, Python edition) — free PDF · The Hundred-Page Machine Learning Book · The Elements of Statistical Learning (ESL) — free PDF from the authors · scikit-learn User Guide

Watch3Blue1Brown — Essence of Linear Algebra · StatQuest with Josh Starmer — Machine Learning playlist · Andrew Ng — Supervised Machine Learning: Regression and Classification (Course 1, ML Specialization)

Problems

Done when — You have a personal GitHub repo of from-scratch implementations (linreg, logreg, tree, random forest, gradient boosting, k-means, PCA, and a kernel SVM) whose outputs match scikit-learn to several decimals, plus a cross-validated Titanic submission on the public leaderboard. You can explain, on a whiteboard, why L2 shrinks weights and L1 zeroes them, and why boosting fits residuals.

Stage 2 — Neural network fundamentals — backprop from scratch · 6-10 weeks weeks
Build automatic differentiation and a multi-layer perceptron from absolute zero (a micrograd-style engine), then understand training dynamics deeply enough to debug a training run that's going wrong.

Concepts, resources and problems

Concepts — The neuron as a composition of differentiable operations; the computational graph · Backpropagation = the chain rule applied recursively over the graph; reverse-mode autodiff · Building an autograd engine: forward pass stores a graph, backward pass accumulates gradients (and why you must zero them) · MLPs: layers, activation functions (ReLU, tanh, sigmoid, GELU), the universal approximation intuition · Loss functions: MSE, cross-entropy, and the softmax that feeds it (and the numerically-stable log-sum-exp trick) · Optimization: SGD, mini-batches, momentum, Adam, learning rate, and why the learning rate is the most important hyperparameter · Training dynamics: vanishing/exploding gradients, weight initialization (Xavier/He), the importance of normalizing inputs · Regularization for nets: dropout, weight decay, early stopping, data augmentation · Batch normalization: what it does to the loss landscape and why it helps (and what it does to gradient statistics) · Diagnosing training: reading loss curves, the train/val gap, overfitting vs an actual bug

ReadMichael Nielsen — Neural Networks and Deep Learning (free online book) · Karpathy — 'A recipe for training neural networks' blog post · CS231n course notes — Optimization, Backprop, Neural Nets modules

WatchAndrej Karpathy — The spelled-out intro to neural networks and backpropagation: building micrograd · Andrej Karpathy — Neural Networks: Zero to Hero (full playlist) · 3Blue1Brown — Neural Networks series (esp. 'Backpropagation calculus')

Problems

Done when — You've built a working autograd engine from scratch, trained a hand-coded NumPy MLP to >98% on MNIST without any framework, and gradient-checked every layer of a CS231n-style net. You can sketch the backward pass of a 2-layer net on a whiteboard and explain why a learning rate that's 10x too high diverges.

Stage 3 — The modern toolkit — PyTorch, CNNs, attention, embeddings · 10-14 weeks weeks
Become fluent in PyTorch (you've earned it by suffering through NumPy), then build the modern architecture families — CNNs for vision, embeddings + attention for sequences — culminating in a from-scratch transformer.

Concepts, resources and problems

Concepts — PyTorch mechanics: tensors, autograd, nn.Module, optimizers, the training loop, moving to GPU, and how .backward() is just your micrograd at scale · Convolutions: filters, stride, padding, receptive fields, feature maps, why weight-sharing + locality beats dense layers on images · Classic CNN architectures: LeNet -> AlexNet -> ResNet, and what residual connections solve for gradient flow · Embeddings: dense vector representations of discrete tokens; word2vec intuition; why embeddings are learned lookup tables · Sequence models: RNNs, the vanishing-gradient problem in time, LSTMs/GRUs as a partial fix · Self-attention: queries, keys, values; softmax(QKᵀ/√d)V; why attention replaced recurrence; positional encodings · The transformer block: multi-head attention, causal masking, layer norm, residual connections, the feed-forward sublayer · Transfer learning and fine-tuning: standing on pretrained shoulders · Tokenization (BPE) and the encoder/decoder vs decoder-only distinction

ReadThe Annotated Transformer (Harvard NLP) · Jay Alammar — The Illustrated Transformer · Dive into Deep Learning (d2l.ai) — free interactive book · Attention Is All You Need (the original paper)

WatchAndrej Karpathy — Let's build GPT: from scratch, in code, spelled out · fast.ai — Practical Deep Learning for Coders (full course) · Karpathy — makemore series (building language models, WaveNet, BatchNorm internals)

Problems

Done when — You've trained a from-scratch transformer that generates coherent text, a from-scratch ResNet-style CNN to >90% on CIFAR-10, and passed CS231n's hardest assignments cold. You're fluent enough in PyTorch to implement any architecture from its paper, and can explain self-attention's Q/K/V math without notes.

Stage 4 — The practitioner layer — data, evaluation, and winning at Kaggle · 8-12 weeks weeks
Cross the gap between 'works on MNIST' and 'works on messy real data nobody cleaned for you' — data pipelines, rigorous evaluation, fighting overfitting in the wild, and competing seriously on Kaggle.

Concepts, resources and problems

Concepts — Real data pipelines: loading, cleaning, handling missing values, categorical encoding, feature scaling — and doing it without leakage · Feature engineering as the highest-leverage skill: domain-driven features, target/mean encoding done safely (out-of-fold), interaction terms · Robust evaluation: stratified k-fold, group/time-series cross-validation, why your CV must mirror the test split · The validation set is sacred: how to set up a CV scheme you can trust so the leaderboard doesn't surprise you · Overfitting in the wild: leaderboard overfitting, the public/private split, why your local CV is more trustworthy than the public LB · Class imbalance, proper metrics for it, threshold tuning, probability calibration · Gradient boosting in practice: XGBoost / LightGBM / CatBoost, the tabular workhorses, and hyperparameter tuning (Optuna) · Ensembling and stacking: blending diverse models to win the last few decimal places · Experiment discipline: tracking runs (Weights & Biases / MLflow), reproducibility, seeds, and not fooling yourself · Error analysis: looking at what your model gets wrong and why, instead of staring at a single aggregate number

ReadAndrew Ng — Machine Learning Yearning (free book) · Kaggle Learn micro-courses + winning-solution write-ups · Approaching (Almost) Any Machine Learning Problem — Abhishek Thakur (free PDF on GitHub) · Google — Rules of Machine Learning (Martin Zinkevich)

Watchfast.ai — Practical Deep Learning for Coders (the data + Kaggle-focused lessons) · Abhishek Thakur — Kaggle competition walkthroughs (YouTube channel) · StatQuest — XGBoost, ROC/AUC, and Cross-Validation explained

Problems

Done when — You've earned a medal (or at minimum placed top 25%) in a LIVE Kaggle competition against real competitors, with a cross-validation scheme you trust more than the public leaderboard — and you were right. You can take a raw, messy CSV and produce a leak-free, well-evaluated model with a write-up explaining every decision.

Stage 5 — Paper reproduction & consolidation — earn the right to specialize · 8-14 weeks weeks
Prove you can stand on the frontier's doorstep: read a real ML paper and reproduce its core result from scratch, then consolidate everything into a coherent mental model so any specialization (LLMs, RL, vision, systems) is a short hop.

Concepts, resources and problems

Concepts — How to read an ML paper: the three-pass method (skim, read, reproduce), finding the one key idea · Reproducibility as a discipline: matching reported numbers, controlling seeds, the gap between paper and reality · Reading code alongside papers: nanoGPT, minGPT, and reference implementations as a learning tool · Connecting the dots: how regularization, optimization, and architecture choices recur across every model you've built · Compute literacy: training on GPUs, mixed precision (bf16/fp16), gradient accumulation, when you're memory-bound vs compute-bound · Knowing the map of specializations (LLMs/NLP, computer vision, RL, generative/diffusion, ML systems/MLOps, interpretability) so you can choose your next obsession deliberately · Communicating ML: writing up a result clearly enough that someone else could reproduce it (the Distill standard) · Building intuition for what's hard vs easy in current research, and where the open problems are

ReadDistill.pub — the archive of clearly-explained ML research · Lilian Weng's blog (Lil'Log) · ML Reproducibility Challenge (official site) · arXiv cs.LG / cs.CL — and the 'foundational papers' you've earned the background to read

WatchYannic Kilcher — paper explanation videos · Karpathy — Let's reproduce GPT-2 (124M) · fast.ai — Part 2: From Deep Learning Foundations to Stable Diffusion

Problems

Done when — You've reproduced a published ML result from the paper alone (numbers matching), made and measured one original modification, and written a public explainer good enough that strangers thank you for it. You now know the field's map well enough to choose your next specialization on purpose, not by accident.

Projects

  • scikit-from-scratch: a classical ML library in pure NumPy — A clean, tested Python package implementing linear & logistic regression (closed-form + gradient descent), a decision tree, a random forest, gradient boosting, k-NN, k-means, and PCA — each with a fit/predict API mirroring sklearn, and a test suite that asserts your outputs match sklearn within tolerance.
  • micrograd++: an autograd engine and neural-net library you built — A from-scratch reverse-mode automatic differentiation engine (à la Karpathy's micrograd) extended into a usable mini-framework: Tensor/Value class with a full set of differentiable ops, an nn module with Linear/MLP/activation layers, optimizers (SGD + momentum + Adam), and loss functions — all trainable on MNIST to >98%.
  • A from-scratch Transformer + character-level language model — Implement a transformer from the equations up in PyTorch — multi-head causal self-attention, positional encodings, the full block with layer norm and residuals — and train it into a working character-level language model that generates plausible text from a corpus of your choice (Shakespeare, code, your own writing).
  • A serious live Kaggle campaign for a medal — Pick an active Kaggle competition and run a full, disciplined campaign: thorough EDA, a leak-free cross-validation scheme you trust, iterative feature engineering, a tuned model or stacked ensemble, experiment tracking, and a public write-up of your approach and what you learned. Goal: a medal (top 10%), or at minimum top 25%, against real competitors.
  • Paper reproduction: rebuild a published result from scratch (FLAGSHIP) — Choose one landmark paper (ResNet, BatchNorm, Dropout, word2vec, the original GAN, or reproduce GPT-2 124M with Karpathy) and reproduce its headline result from the paper alone — matching the reported numbers — then make and measure one original modification of your own. The portfolio centerpiece of this entire path.

Going harder

Hard problem arena — 9 brutal problems
  • hard Karpathy's micrograd / makemore / 'build GPT from scratch' challenges — The defining from-scratch gauntlet for this path. Close the video, rebuild the autograd engine, the MLP, and the transformer from a blank file, then match PyTorch's gradients to machine precision. The exercises at the end of each lecture are where the real difficulty hides.
  • hard Stanford CS231n assignments, done cold — Implement affine/ReLU/softmax/batchnorm/dropout layers, full backprop, a CNN, and RNN/LSTM/attention from scratch in NumPy under a strict gradient-checking harness, then in a framework. The canonical academic version of this roadmap's from-scratch builds — famously rigorous, and unforgiving of hand-waving.
  • brutal Reproduce GPT-2 (124M) end-to-end — Rebuild a real published model from scratch, confronting the compute, data, and engineering reality professionals face (mixed precision, data loading, ~$10 of GPU). Matching the loss curve is brutal; making one original, measured modification afterward is where you cross from student to researcher.
  • legendary ML Reproducibility Challenge — reproduce a top-conference paper — A structured, community-run attempt to reproduce papers accepted at NeurIPS/ICML/ICLR, with a real venue to submit your report to. Pick one, rebuild its core claim from the equations, and report whether it holds. As close to frontier research as a self-learner can get.
  • legendary Live 'Featured' Kaggle competitions (prize-money tier) — The hardest competitive ML on Earth lives here: real money, Grandmasters, a hidden private leaderboard, and no answers in the forum because nobody knows them yet. A medal (especially gold/solo-gold) is a genuine, externally-verified mark of skill.
  • legendary Reproduce a deep RL algorithm (PPO/DQN) until it actually learns — RL is notoriously the hardest thing to reproduce — tiny bugs silently break training with no error and no loss signal to guide you. Implementing PPO or DQN from the paper and getting it to actually solve a control task is a legendary rite of passage in debugging from first principles.
  • brutal Jane Street Monthly Puzzles — The original 'go hard' energy: math, logic, and probability puzzles released roughly monthly with a deep archive. Not ML-specific, but exactly the lateral problem-solving and probabilistic reasoning that sharpens the mind ML rewards. Race the archive when you want a pure brain-bender.
  • brutal Project Euler (hard tier, 300+) — Computational math problems where brute force is hopeless and you need an insight. The higher-numbered problems demand genuine mathematical cleverness plus efficient code — the same muscle that makes you good at the numerics under ML.
  • hard Advent of Code (later days) — An annual December puzzle series whose final-week problems combine algorithmic depth with brutal edge-case handling under time pressure. The whole archive is open year-round — a fast, addictive way to keep the implement-it-correctly-and-fast muscle warm between ML builds.

Keep curious

Blogs, people, communities, rabbit holes
  • BLOGS — Lilian Weng's Lil'Log (lilianweng.github.io) for deep survey posts; Chris Olah & the Distill.pub archive (distill.pub) for the gold standard of ML explanation; Sebastian Raschka's 'Ahead of AI' (magazine.sebastianraschka.com) for rigorous, practitioner-grade deep dives; Jay Alammar (jalammar.github.io) for the best visual explainers; Karpathy's blog (karpathy.github.io) — read every post, twice.
  • NEWSLETTERS — The Batch by Andrew Ng / DeepLearning.AI (deeplearning.ai/the-batch, the canonical weekly field digest); Sebastian Raschka's Ahead of AI; Import AI by Jack Clark (jack-clark.net, policy + frontier signal); The Gradient (thegradient.pub) for thoughtful long-form.
  • PEOPLE TO FOLLOW — Andrej Karpathy (@karpathy, the patron saint of learning ML from scratch), Jeremy Howard (@jeremyphoward, fast.ai), Sebastian Raschka (@rasbt), Andrew Ng (@AndrewYNg), Yann LeCun (@ylecun), Chris Olah (@ch402), Lilian Weng (@lilianweng), François Chollet (@fchollet), Jay Alammar (@jayalammar). Follow them on X and their blogs, and read what they read.
  • COMMUNITIES — r/MachineLearning (papers & discussion) and r/learnmachinelearning (the learning journey); the fast.ai forums (forums.fast.ai) — one of the kindest, highest-signal learning communities online; Kaggle's discussion forums (the single best place to learn how strong practitioners actually think); the EleutherAI and Hugging Face Discords for open-source frontier work; the nanoGPT GitHub discussions.
  • COMPETITIONS — Kaggle is home base (start with Getting Started, graduate to the monthly Playground Series, then Featured competitions for medals). Also: the ML Reproducibility Challenge (reproml.org, now a NeurIPS track), DrivenData (drivendata.org) for social-impact competitions, and AIcrowd (aicrowd.com) for research-flavored challenges.
  • CONFERENCES (watch the talks free online) — NeurIPS, ICML, ICLR are the big three for ML research; CVPR/ICCV for vision, ACL/EMNLP for NLP. You don't need to attend — recorded talks, tutorials, and the papers are all online, and skimming each year's 'best paper' awards is a great frontier pulse-check.
  • PRIMARY SOURCES — arXiv (arxiv.org, cs.LG / cs.CL / cs.CV) for where the field publishes, and Hugging Face Papers (huggingface.co/papers) to find papers WITH runnable code and community discussion (it absorbed the now-defunct Papers with Code). Build the habit of reading one paper a week with the three-pass method; reproduce one a quarter.
  • TOOLS TO GO DEEP ON — PyTorch internals and torch.compile; Hugging Face (transformers, datasets, the ecosystem, and the free huggingface.co/learn courses); Weights & Biases or MLflow for experiment tracking; nanoGPT and minGPT as reference implementations to study line by line; Optuna for hyperparameter search.
  • FRONTIER RABBIT HOLES — mechanistic interpretability (Anthropic's transformer-circuits.pub, and Neel Nanda's 'how to get started' guide on the Alignment Forum + his TransformerLens library — 'what is the model actually computing?'); the scaling-laws literature (Chinchilla, the original Kaplan paper); diffusion & generative modeling (fast.ai Part 2 reimplements Stable Diffusion from scratch); efficient training / systems-for-ML, a natural fit for your backend brain — FlashAttention, quantization (GPTQ/AWQ), and distributed training (FSDP, DeepSpeed).
  • IF THIS CLICKS, GO DEEPER HERE — LLMs/NLP: Karpathy's later videos + the Hugging Face course (huggingface.co/learn); Deep RL: OpenAI Spinning Up (spinningup.openai.com) + David Silver's RL course on YouTube; Computer Vision: finish CS231n; ML Systems/MLOps: 'Designing Machine Learning Systems' by Chip Huyen + Made With ML (madewithml.com); Theory: ESL cover-to-cover, then Kevin Murphy's 'Probabilistic Machine Learning' (probml.github.io, free).
  • MATH BACKBONE TO KEEP SHARPENING — 3Blue1Brown for geometric intuition; Mathematics for Machine Learning (mml-book.github.io, free) when you want the linear algebra / calculus / probability foundations made rigorous; Boyd & Vandenberghe's Convex Optimization (stanford.edu/~boyd/cvxbook, free) once optimization fascinates you.
How you'll know you've actually got it
  • You can derive backpropagation for an arbitrary feed-forward network on a whiteboard, from the chain rule, with no notes — and explain why a learning rate 10x too large diverges.
  • Given a raw, messy CSV nobody cleaned, you can produce a leak-free, properly cross-validated model with a write-up justifying every preprocessing and evaluation choice.
  • You've implemented linear/logistic regression, a decision tree, a kernel SVM, a transformer, and an autograd engine from scratch, and your gradients match PyTorch to machine precision.
  • You can read a new ML paper with the three-pass method and reproduce its core result from the equations alone — matching the reported numbers within a reasonable margin.
  • When a training run misbehaves, you debug it methodically from first principles (data? init? LR? a real bug?) instead of randomly tweaking hyperparameters and hoping.
  • You've earned a medal (or at least placed top 25%) in a LIVE Kaggle competition against motivated competitors — with a local CV score you trusted more than the public leaderboard, and were right.
  • You can explain the bias-variance tradeoff, regularization, and self-attention clearly enough that a confused beginner suddenly gets it — teaching is effortless because the understanding is real.
  • You instinctively reach for the simplest model that could work and can articulate exactly when added complexity (a deeper net, a bigger ensemble) is and isn't worth it.
  • You've written at least one public explainer or reproduction that strangers found genuinely useful — your understanding now produces artifacts others learn from.
  • You no longer feel the field is a pile of disconnected tricks: regularization, optimization, and architecture feel like one coherent story, and you can place any new technique on that map.

← all roadmaps · back to hub