Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Unit II Overview: Dimensionality Reduction

Lesson 7 of 22 in the free Machine Learning II notes on Siksha Sarovar, written by Rohit Jangra.

Unit II Overview: Dimensionality Reduction

High-dimensional data poses fundamental computational and statistical challenges. Dimensionality reduction methods find compact, low-dimensional representations that preserve the most important structure of the data.

The Curse of Dimensionality

Introduced by Bellman (1961), this phenomenon describes how data becomes increasingly sparse in high-dimensional spaces:

  • Volume of a hypersphere relative to enclosing hypercube -> 0 as dimensions increase
  • K-nearest neighbors become unreliable as all distances converge
  • Exponentially more data is needed to maintain the same density

Edge length needed to capture 10% of data in a hypercube: l(d) = 0.1^(1/d)

At d=10, you need ~79% of each dimension's range to capture just 10% of the data.

Why Dimensionality Reduction?

MotivationDescription
VisualizationReduce to 2D/3D for human understanding
Noise reductionRemove irrelevant/noisy dimensions
Computational efficiencyFaster training and inference
Avoid curse of dimensionalityBetter generalization with fewer features
Feature discoveryFind latent structure in data

Unit II Roadmap

TechniqueTypeKey Property
LDASupervised linearMaximizes class separability
PCAUnsupervised linearMaximizes variance
Kernel PCAUnsupervised non-linearNon-linear manifold
Factor AnalysisProbabilistic linearShared latent factors
ICAUnsupervised linearStatistical independence

Two Paradigms

  • Feature Selection: Select a subset of original features (interpretable, no transformation)
  • Feature Extraction: Create new features as functions of originals (PCA, LDA, ICA)

Exam-Ready Summary

  • Curse of dimensionality: data becomes sparse exponentially fast as dimensions grow
  • Dimensionality reduction can be supervised (LDA) or unsupervised (PCA)
  • Linear methods: PCA, LDA, FA, ICA — fast and interpretable
  • Non-linear methods: Kernel PCA, t-SNE, UMAP — powerful but harder to interpret
  • Always check how much variance/information is retained after reduction