Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

3. Ensemble Methods: Boosting

Lesson 4 of 22 in the free Machine Learning II notes on Siksha Sarovar, written by Rohit Jangra.

3. Ensemble Methods: Boosting

Boosting is a sequential ensemble method that converts many weak learners (models slightly better than random guessing) into a powerful strong learner. Proposed by Schapire (1990), AdaBoost by Freund and Schapire (1997) won the Godel Prize and became one of the most influential ML algorithms.

Core Idea

Train classifiers sequentially. Each successive classifier focuses on examples that previous classifiers got wrong, by increasing their weights in the training distribution.

AdaBoost Algorithm

  1. Initialize weights: w_i = 1/N for all i = 1,...,N
  2. For t = 1 to T:
  3. a. Train weak learner h_t on weighted data distribution b. Compute weighted error: eps_t = sum(w_i I[h_t(x_i) != y_i]) c. Compute learner weight: alpha_t = 0.5 ln((1 - eps_t) / eps_t) d. Update sample weights: w_i = w_i exp(-alpha_t y_i * h_t(x_i)) e. Normalize: w_i = w_i / sum(w_j)

  4. Final predictor: H(x) = sign(sum_{t=1}^{T} alpha_t * h_t(x))

Gradient Boosting

Generalizes boosting to any differentiable loss function. Each new tree fits the negative gradient (pseudo-residuals) of the loss: F_t(x) = F_{t-1}(x) + learning_rate * h_t(x)

Popular: XGBoost adds L1+L2 regularization, LightGBM uses histogram-based splits.

AdaBoost vs Gradient Boosting

FeatureAdaBoostGradient Boosting
Weak learnerDecision stumpsShallow trees (depth 3-8)
MechanismRe-weight samplesFit pseudo-residuals
Loss functionExponentialAny differentiable loss
RobustnessSensitive to noiseMore robust with regularization

Theoretical Guarantee

AdaBoost training error decreases exponentially with rounds T: Training Error <= exp(-2 * sum(gamma_t^2)) where gamma_t = edge of weak learner t (amount better than random = 0.5 - eps_t).

Common Pitfalls

  • Too many rounds can overfit if base learners are too complex
  • Sensitive to noisy labels (outliers get progressively upweighted)
  • Learning rate and tree depth require careful tuning

Exam-Ready Summary

  • AdaBoost: exponential loss, re-weights misclassified samples, weak learner = decision stump
  • Gradient Boosting: any differentiable loss, fits pseudo-residuals, flexible
  • XGBoost adds regularization terms (L1 + L2) to gradient boosting
  • Boosting reduces bias; unlike bagging, it is sequential and not trivially parallelizable
  • Edge gamma_t must be > 0 for the exponential convergence guarantee to apply