Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.4 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Machine Learning II — Free Notes & Tutorial

Free Advanced Machine Learning (ML2) notes for BCA — deep learning, ensemble methods, NLP at SikshaSarovar.

This Machine Learning II course is part of Siksha Sarovar and is 100% free for students in India — no sign-up required to read. It contains 22 structured lessons with examples, and pairs with our free online compiler and AI tutor.

What you will learn

  • Deep learning
  • Ensemble methods
  • NLP
  • Advanced ML

Course content (22 lessons)

  1. Unit I Overview: Combining Different Models — Unit I Overview: Combining Different Models Ensemble methods combine multiple base learners to produce a more powerful predictor. Condorcet's Jury Theorem (1785) shows that when…
  2. 1. Evaluating ML Algorithms & Model Selection — 1. Evaluating ML Algorithms and Model Selection Model evaluation is the foundation of trustworthy ML. Without rigorous evaluation, we cannot distinguish genuine generalization…
  3. 2. Introduction to Statistical Learning Theory — 2. Introduction to Statistical Learning Theory Statistical Learning Theory (SLT) provides mathematical foundations for understanding when and why ML generalizes. Developed by…
  4. 3. Ensemble Methods: Boosting — 3. Ensemble Methods: Boosting Boosting is a sequential ensemble method that converts many weak learners (models slightly better than random guessing) into a powerful strong…
  5. 4. Ensemble Methods: Bagging — 4. Ensemble Methods: Bagging Bagging (Bootstrap AGGregatING), introduced by Breiman (1994), is a parallel ensemble method that reduces variance by training multiple models on…
  6. 5. Ensemble Methods: Random Forests — 5. Ensemble Methods: Random Forests Random Forests, introduced by Breiman (2001), extend bagging by adding random feature selection at each split, further decorrelating base trees…
  7. Unit II Overview: Dimensionality Reduction — Unit II Overview: Dimensionality Reduction High-dimensional data poses fundamental computational and statistical challenges. Dimensionality reduction methods find compact,…
  8. 6. Linear Discriminant Analysis (LDA) — 6. Linear Discriminant Analysis (LDA) Linear Discriminant Analysis was introduced by R.A. Fisher in 1936 as a method to find the linear combination of features that best separates…
  9. 7. Principal Component Analysis (PCA) — 7. Principal Component Analysis (PCA) PCA is the most widely used unsupervised dimensionality reduction technique. Developed by Pearson (1901) and Hotelling (1933), PCA finds an…
  10. 8. Kernel PCA — 8. Kernel PCA Kernel PCA extends PCA to non-linearly separable data by first mapping inputs to a high-dimensional feature space using a kernel function, then applying PCA in that…
  11. 9. Factor Analysis — 9. Factor Analysis Factor Analysis (FA) is a probabilistic generative model that explains observed variables as linear combinations of a small number of latent factors plus unique…
  12. 10. Independent Component Analysis (ICA) — 10. Independent Component Analysis (ICA) ICA is a computational technique for separating a multivariate signal into statistically independent non-Gaussian components. The classic…
  13. Unit III Overview: Learning With Neural Networks — Unit III Overview: Learning With Neural Networks Artificial Neural Networks (ANNs) are inspired by the biological neural networks in animal brains. The modern deep learning…
  14. 11. The Perceptron — 11. The Perceptron The Perceptron, invented by Frank Rosenblatt in 1957, was the first algorithmically described neural network. It sparked enormous optimism but was later shown…
  15. 12. Multilayer Neural Networks & Backpropagation — 12. Multilayer Neural Networks and Backpropagation Multilayer Perceptrons (MLPs) overcome the linearity limitation of single perceptrons by stacking layers of neurons with…
  16. 13. Learning Neural Network Structures — 13. Learning Neural Network Structures The architecture and regularization of a neural network are just as important as the training algorithm itself. This lesson covers practical…
  17. 14. Deep Learning & Feature Representation Learning — 14. Deep Learning and Feature Representation Learning Deep learning is characterized by learning hierarchical feature representations directly from raw data. Rather than…
  18. Unit IV Overview: Reinforcement Learning — Unit IV Overview: Reinforcement Learning Reinforcement Learning (RL) addresses the problem of learning to act in an environment to maximize cumulative reward. Unlike supervised…
  19. 15. Elements of Reinforcement Learning — 15. Elements of Reinforcement Learning This lesson formalizes the key components of the RL framework: policies, value functions, and the role of the discount factor in shaping…
  20. 16. Generalization in Reinforcement Learning — 16. Generalization in Reinforcement Learning Generalization in RL is the ability to perform well on states not seen during training. Deep RL extends the Q-function and policy to…
  21. 17. Policy Search — 17. Policy Search Policy search methods directly optimize the policy parameters without necessarily learning a value function. They are particularly powerful for continuous action…
  22. 18. Adaptive Dynamic Programming — 18. Adaptive Dynamic Programming Adaptive Dynamic Programming (ADP) bridges dynamic programming (which requires a model) and model-free RL (which learns from experience). It…

Unit I Overview: Combining Different Models

Unit I Overview: Combining Different Models

Ensemble methods combine multiple base learners to produce a more powerful predictor. Condorcet's Jury Theorem (1785) shows that when independent voters are slightly better than chance, a majority vote approaches certainty. Machine learning exploits this through bagging, boosting, and stacking.

Why Ensembles Work

The expected test error decomposes as: Total Error = Bias^2 + Variance + Irreducible Noise

  • Bagging trains models in parallel on bootstrap samples — reduces variance by averaging.
  • Boosting trains models sequentially, correcting previous errors — reduces bias.
  • Stacking learns a meta-model to optimally weight diverse base learners.

Unit I Roadmap

TopicCore TechniquePrimary Benefit
Model EvaluationCross-validation, AUCReliable performance estimates
Statistical Learning TheoryPAC learning, VC dimensionFormal generalization bounds
BoostingAdaBoost, GradientBoostBias reduction
BaggingBootstrap aggregatingVariance reduction
Random ForestsRandom subspace + baggingBias + variance reduction

No Free Lunch Theorem

No single algorithm outperforms all others on every problem distribution. This motivates ensembles: diverse models cover different hypothesis regions for broadly robust performance.

Requirements for Effective Ensembles

  1. Base learner accuracy must exceed 50% (better than random).
  2. Base learners must make diverse, uncorrelated errors — diversity is paramount.
  3. A combination rule (vote, average, or meta-learner) must aggregate predictions.

Exam-Ready Summary

  • Ensemble error = f(individual errors, inter-model correlation)
  • Lower inter-model correlation means greater variance reduction
  • Bagging: parallel, best with unstable high-variance models (deep trees)
  • Boosting: sequential, best with stable weak learners (decision stumps)
  • Diversity is essential — identical models provide zero ensemble benefit

1. Evaluating ML Algorithms & Model Selection

1. Evaluating ML Algorithms and Model Selection

Model evaluation is the foundation of trustworthy ML. Without rigorous evaluation, we cannot distinguish genuine generalization from overfitting — a model that memorizes training data but fails in production.

Bias-Variance Decomposition

The generalization error decomposes as: Total Error = Bias^2 + Variance + Irreducible Noise

  • Bias: Error from incorrect assumptions. High bias implies underfitting — the model is too simple.
  • Variance: Error from sensitivity to training data. High variance implies overfitting — the model is too complex.
  • Sweet spot: Complexity that minimizes bias^2 + variance simultaneously.

Cross-Validation Techniques

MethodHow It WorksWhen to Use
HoldoutSingle 80/20 splitLarge datasets, quick checks
k-Fold CVk rotated validation splitsStandard practice
Stratified k-FoldPreserves class proportionsImbalanced classes
LOOCVn-Fold, leave one outVery small datasets
Time-Series CVForward walk-through splitsSequential/temporal data

Classification Metrics

MetricFormulaPrefer When
Accuracy(TP+TN)/TotalBalanced classes
PrecisionTP/(TP+FP)False positives costly
RecallTP/(TP+FN)False negatives costly
F1-Score2PR/(P+R)Imbalanced datasets
AUC-ROCArea under ROCThreshold-independent

Model Selection Strategies

  1. Grid Search: Exhaustive sweep over hyperparameter grid.
  2. Random Search: Sample from distributions — often 3x more efficient for large grids.
  3. Bayesian Optimization: Build a surrogate model to focus on promising hyperparameter regions.

Common Pitfalls

  • Data leakage: Preprocessing using future information contaminates results.
  • Test set reuse: Multiple comparisons inflate apparent performance.
  • Wrong metric: Accuracy is misleading on highly imbalanced datasets.

Exam-Ready Summary

  • 5-fold or 10-fold CV is standard; LOOCV is unbiased but expensive
  • Always stratify folds for classification tasks
  • High bias: increase model complexity or add features
  • High variance: add data, apply regularization, reduce model complexity
  • Data leakage is the most dangerous source of misleading evaluation results

Frequently asked questions

Is the Machine Learning II course really free?

Yes. The entire Machine Learning II course on Siksha Sarovar is free to read with no account required. You can optionally sign in with Google to save your progress.

Do I get a certificate for Machine Learning II?

Yes — finish the lessons and pass the quiz to earn a free, verifiable certificate you can share on LinkedIn or with recruiters.

Can I run code while learning?

Yes. The built-in online compiler runs C, C++, Python, Java, PHP, JavaScript, C# and SQL directly in your browser — no installation needed.