Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

6. Linear Discriminant Analysis (LDA)

Lesson 8 of 22 in the free Machine Learning II notes on Siksha Sarovar, written by Rohit Jangra.

6. Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis was introduced by R.A. Fisher in 1936 as a method to find the linear combination of features that best separates two or more classes. Unlike PCA (which ignores class labels), LDA is a supervised dimensionality reduction technique that maximizes class separability.

Fisher's Criterion

LDA finds a projection W that maximizes the ratio of between-class scatter to within-class scatter: J(W) = |W^T S_B W| / |W^T S_W W|

Where:

  • S_B = between-class scatter matrix = sum_c n_c * (mu_c - mu)(mu_c - mu)^T
  • S_W = within-class scatter matrix = sum_c sum_{x in c} (x - mu_c)(x - mu_c)^T

The optimal W consists of eigenvectors of S_W^{-1} * S_B.

LDA vs PCA

AspectLDAPCA
Uses class labelsYes (supervised)No (unsupervised)
ObjectiveMaximize class separationMaximize variance
Max componentsmin(n_classes-1, n_features)min(n_samples, n_features)
Best forClassificationVisualization, compression
AssumesGaussian classes, equal covarianceNo distributional assumption

LDA Assumptions

  1. Gaussian class distributions: Each class is multivariate Gaussian.
  2. Equal class covariances: All classes share the same covariance matrix (homoscedasticity).
  3. Linear separability: Classes are linearly separable in the projected space.

When assumptions hold, LDA = Bayes optimal classifier for Gaussian classes.

Worked Example: Wine Dataset

The UCI Wine dataset has 13 features and 3 classes. LDA projection onto 2 components achieves ~98% KNN accuracy vs ~82% for PCA with 2 components, because LDA's 2 components are chosen for class separation, not maximum variance.

Common Pitfalls

  • LDA fails when within-class covariances differ significantly (use QDA instead)
  • Degenerate S_W when n_features > n_samples (use regularized LDA)
  • Sensitive to outliers which distort scatter matrices

Exam-Ready Summary

  • LDA: supervised, maximizes between-class/within-class scatter ratio
  • Max components = C-1 for C-class problem (always fewer components than PCA)
  • Assumes Gaussian classes with equal covariances (unlike QDA which allows different covariances)
  • LDA is the Bayes optimal classifier under its assumptions
  • Regularized LDA (RLDA) handles p > n by adding lambda*I to S_W