Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

11. The Perceptron

Lesson 14 of 22 in the free Machine Learning II notes on Siksha Sarovar, written by Rohit Jangra.

11. The Perceptron

The Perceptron, invented by Frank Rosenblatt in 1957, was the first algorithmically described neural network. It sparked enormous optimism but was later shown by Minsky and Papert (1969) to be incapable of learning non-linearly separable functions (like XOR). This insight ultimately led to the development of multi-layer networks and backpropagation.

Perceptron Model

Given input x in R^d, the perceptron computes: y_hat = sign(w^T x + b) = sign(sum(w_i x_i) + b)

Where w is the weight vector and b is the bias term.

Perceptron Learning Algorithm

  1. Initialize weights w = 0 (or random small values)
  2. For each epoch:
  • For each training example (x_i, y_i):
  • Compute prediction: y_hat = sign(w^T * x_i)
  • If y_hat != y_i (misclassified):
  • Update: w = w + learning_rate y_i x_i
  • Update: b = b + learning_rate * y_i
  1. Repeat until convergence (no misclassifications) or max epochs reached

Perceptron Convergence Theorem

Theorem (Rosenblatt, 1962): If the training data is linearly separable, the Perceptron algorithm will converge to a separating hyperplane in a finite number of steps.

Specifically, convergence occurs in at most (R / gamma)^2 mistakes, where:

  • R = max norm of training examples
  • gamma = geometric margin of the optimal separating hyperplane

Geometric Interpretation

The perceptron finds a hyperplane w^T * x + b = 0 that separates positive from negative examples. The weight update rule moves the hyperplane toward correctly classifying misclassified examples.

Limitations

LimitationImplication
Linearly separable onlyCannot solve XOR or any non-linear problem
No convergence for non-separable dataCycles indefinitely
Single layerCannot learn hierarchical features
Hard thresholdNot differentiable — no gradient for optimization

Common Pitfalls

  • The convergence theorem assumes linear separability — must verify this holds
  • Learning rate affects convergence speed but not the final solution (for separable data)
  • Multiple valid separating hyperplanes exist — the perceptron finds one, not necessarily the best (SVM finds the maximum-margin one)

Exam-Ready Summary

  • Perceptron: linear classifier using sign activation and error-based weight updates
  • Convergence theorem: guaranteed convergence in finite steps on linearly separable data
  • Number of mistakes bounded by (R/gamma)^2 (margin-dependent)
  • Cannot solve XOR — requires hidden layers (MLP)
  • Perceptron mistake bound is the precursor to Support Vector Machine theory