Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

2. Introduction to Statistical Learning Theory

Lesson 3 of 22 in the free Machine Learning II notes on Siksha Sarovar, written by Rohit Jangra.

2. Introduction to Statistical Learning Theory

Statistical Learning Theory (SLT) provides mathematical foundations for understanding when and why ML generalizes. Developed by Vapnik and Chervonenkis in the 1970s and formalized as PAC learning by Valiant (1984), SLT gives provable guarantees about generalization from finite training data.

Probably Approximately Correct (PAC) Learning

A hypothesis class H is PAC-learnable if there exists an algorithm A such that for any distribution D, any target f in H, and parameters epsilon, delta > 0: P[error(h) > epsilon] <= delta

The required training samples m satisfy: m >= (1/epsilon) * (ln(|H|) + ln(1/delta))

This theorem tells us how many training examples are needed to guarantee an approximately correct hypothesis with high probability.

VC Dimension

The Vapnik-Chervonenkis (VC) dimension measures the capacity of a hypothesis class H.

  • VC(H) = d means H can shatter d points (assign every possible labeling correctly)
  • VC(linear classifiers in R^n) = n+1
  • VC(decision stumps in 1D) = 1

Generalization bound: error(h) <= error_train(h) + sqrt((VC * ln(2m/VC) + ln(1/delta)) / (2m))

Higher VC dimension = more expressive model = more data required.

PAC Learning vs SRM

FrameworkSelects HypothesisKey Insight
ERMMinimizes training errorMay overfit if VC dim is large
SRMMinimizes training error + complexity penaltyBalances fit and complexity
PACGuarantees generalization with probability 1-deltaSample complexity bounds

Rademacher Complexity

A modern, tighter alternative to VC dimension. Measures how well a hypothesis class fits random noise labels. Lower Rademacher complexity implies better generalization. R_m(H) = E[sup_{h in H} (1/m) sum(sigma_i * h(x_i))]

Common Pitfalls

  • PAC bounds are often loose (pessimistic) in practice — they provide insight, not tight predictions
  • VC dimension assumes the worst-case input distribution
  • Real data has structure that makes learning easier than worst-case bounds suggest

Exam-Ready Summary

  • PAC learning: algorithm that returns an approximately correct hypothesis with high probability
  • VC dimension: maximum number of points that any hypothesis in H can shatter
  • Larger VC dimension = more expressive = more training data needed to generalize
  • SRM: structural risk minimization — balance empirical risk with VC complexity penalty
  • Generalization bound = training error + complexity-dependent confidence interval term