Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

13. Learning Neural Network Structures

Lesson 16 of 22 in the free Machine Learning II notes on Siksha Sarovar, written by Rohit Jangra.

13. Learning Neural Network Structures

The architecture and regularization of a neural network are just as important as the training algorithm itself. This lesson covers practical techniques for designing networks that generalize well.

Regularization Techniques

L1 and L2 Weight Decay

Adding a penalty to the loss function prevents large weights:

  • L2 (Ridge/Weight Decay): L_reg = L + (lambda/2) * ||W||^2 — shrinks all weights toward zero, smooth solutions
  • L1 (Lasso): L_reg = L + lambda * ||W||_1 — drives many weights to exactly zero, promotes sparsity

Dropout (Srivastava et al., 2014)

During training, randomly set each neuron's output to zero with probability p (typically 0.5 for hidden layers, 0.2 for input):

  • Prevents co-adaptation of neurons
  • Equivalent to training an ensemble of 2^n thinned networks
  • At test time, scale activations by (1-p) (or use inverted dropout)

Batch Normalization (Ioffe and Szegedy, 2015)

Normalize activations within each mini-batch: BN(x) = gamma * (x - mu_batch) / sqrt(sigma_batch^2 + epsilon) + beta

Benefits: reduces internal covariate shift, allows higher learning rates, mild regularization effect.

Architecture Design Guidelines

Design ChoiceRecommendation
WidthMore neurons = more capacity; use grid search
DepthStart shallow, add layers if underfitting
ActivationReLU (hidden layers), sigmoid/softmax (output)
Dropout rate0.2-0.5 for hidden layers
Batch Norm placementBefore activation function

Hyperparameter Search Strategies

  1. Start with reasonable defaults (e.g., lr=0.001, Adam optimizer, dropout=0.3)
  2. Use learning rate range test to find good learning rate
  3. Random search > grid search for > 3 hyperparameters
  4. Use validation loss as criterion, not training loss

Common Pitfalls

  • Dropout during inference will give wrong results — always disable at test time
  • Batch normalization interacts with dropout — use BN before dropout
  • Very wide networks can overfit even with regularization; depth + regularization is often better

Exam-Ready Summary

  • L2 regularization: shrinks weights; L1: promotes sparsity
  • Dropout: randomly zero activations during training, ensemble interpretation
  • Batch normalization: normalize within mini-batch; reduces covariate shift
  • Architecture: more depth generally beats more width for complex tasks
  • Hyperparameter tuning: random search is practical; Bayesian optimization for expensive searches