Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Probability & Bayes Theorem

Lesson 12 of 37 in the free Data Science notes on Siksha Sarovar, written by Rohit Jangra.

Probability for Data Science

Probability is the mathematical framework for quantifying uncertainty. In data science, almost everything involves uncertainty — from predicting customer behavior to classifying images. Probability provides the language and tools to reason about uncertain events rigorously.

---

Fundamental Concepts

Definition of Probability: The probability of an event A, written P(A), is a number between 0 and 1 that represents the likelihood of the event occurring.

  • P(A) = 0 → Impossible event
  • P(A) = 1 → Certain event
  • 0 < P(A) < 1 → The event may or may not occur

Basic Formula: P(A) = Number of favorable outcomes / Total number of outcomes

Example: Probability of rolling a 4 on a fair die = 1/6 ≈ 0.167

---

Key Probability Rules

RuleFormulaDescription
ComplementP(A') = 1 - P(A)Probability of A NOT happening
Addition (OR)P(A ∪ B) = P(A) + P(B) - P(A ∩ B)Probability of A or B
Multiplication (AND)`P(A ∩ B) = P(A) × P(BA)`Probability of both A and B
IndependenceP(A ∩ B) = P(A) × P(B)When A and B don't affect each other

---

Conditional Probability

Definition: The probability of event A occurring given that event B has already occurred.

P(A|B) = P(A ∩ B) / P(B)

Example:

In a class of 100 students, 40 are female. Of these, 10 have scored above 90%. P(Score > 90 | Female) = 10/40 = 0.25 (25%)

Why It Matters in Data Science:

  • Spam filters calculate: P(Spam | "free money" in email)
  • Medical diagnosis: P(Disease | Positive Test Result)
  • Recommendation: P(User likes Movie B | User liked Movie A)

---

Random Variables

Definition: A random variable is a variable whose value is a numerical outcome of a random phenomenon. It assigns a number to each outcome in a sample space.

Types:

TypeDescriptionExample
DiscreteTakes countable distinct valuesNumber of defective items in a batch (0, 1, 2, ...)
ContinuousTakes any value in a continuous rangeWeight of a person (65.2 kg, 70.8 kg, ...)

---

Probability Distributions

A probability distribution describes how the probabilities are distributed across the possible values of a random variable.

Key Discrete Distributions

1. Bernoulli Distribution:

  • Models a single trial with two outcomes (Success/Failure).
  • P(X=1) = p, P(X=0) = 1-p
  • Example: A single coin flip (Heads = 1, Tails = 0).

2. Binomial Distribution:

  • Models the number of successes in n independent Bernoulli trials.
  • Parameters: n (number of trials), p (probability of success per trial).
  • Example: Number of heads in 10 coin flips.

3. Poisson Distribution:

  • Models the number of events occurring in a fixed interval of time/space when events occur independently at a constant rate.
  • Parameter: λ (lambda) = average rate of events.
  • Example: Number of customer arrivals at a store per hour.

Key Continuous Distributions

4. Normal (Gaussian) Distribution:

  • The most important distribution in statistics — the "bell curve".
  • Parameters: μ (mean, center), σ (standard deviation, spread).
  • 68-95-99.7 Rule: 68% of data falls within 1σ, 95% within 2σ, 99.7% within 3σ of the mean.
  • Example: Heights of people, IQ scores, measurement errors.

5. Uniform Distribution:

  • Every outcome in the range is equally likely.
  • Example: Rolling a fair die (each outcome has P = 1/6).

Distribution Summary Table

DistributionTypeParametersExample Use Case
BernoulliDiscretep (success probability)Email: Spam or Not Spam
BinomialDiscreten, pDefective items in a batch
PoissonDiscreteλ (rate)Website visits per hour
NormalContinuousμ, σHeight, weight, test scores
UniformContinuousa, b (min, max)Random number generation

---

Bayes' Theorem

Bayes' Theorem is one of the most powerful and widely used results in probability. It allows us to update our beliefs about an event as new evidence becomes available.

Formula: P(A|B) = [P(B|A) × P(A)] / P(B)

Where:

  • P(A|B) = Posterior Probability — Updated belief about A after seeing B.
  • P(B|A) = Likelihood — Probability of seeing B if A is true.
  • P(A) = Prior Probability — Initial belief about A (before evidence).
  • P(B) = Evidence — Total probability of observing B.

---

Bayes' Theorem — Worked Example (Medical Test)

A medical test for a rare disease has: Sensitivity (True Positive Rate): 99% — If you have the disease, the test correctly identifies it 99% of the time. Specificity (True Negative Rate): 95% — If you don't have the disease, the test correctly says negative 95% of the time. Disease Prevalence: 1 in 1000 people (0.1%). Question: If a person tests positive, what is the probability they actually have the disease? Solution using Bayes' Theorem: P(Disease) = 0.001 P(No Disease) = 0.999 P(Positive | Disease) = 0.99 P(Positive | No Disease) = 0.05 (False Positive Rate) P(Positive) = (0.99 × 0.001) + (0.05 × 0.999) = 0.00099 + 0.04995 = 0.05094 * P(Disease | Positive) = (0.99 × 0.001) / 0.05094 ≈ 0.0194 ≈ 1.94% Surprising Result! Even with a 99% accurate test, the probability of actually having the disease given a positive result is only about 2%. This is because the disease is so rare (low prior).

---

Applications of Bayes' Theorem in Data Science

ApplicationHow Bayes Is Used
Naive Bayes ClassifierOne of the simplest and most effective text classification algorithms (spam detection)
Medical DiagnosisUpdating the probability of a disease given test results
Search EnginesRanking pages based on the probability of relevance given a query
A/B Testing (Bayesian)Updating the probability that variant B is better than A as more data comes in
Recommendation SystemsUpdating user preference models with each interaction

Summary

  • Probability quantifies uncertainty on a scale of 0 to 1.
  • Conditional probability is the foundation for many ML algorithms.
  • Random variables can be discrete or continuous.
  • Key distributions (Normal, Binomial, Poisson) model real-world phenomena.
  • Bayes' Theorem lets us update beliefs with new evidence — it powers Naive Bayes classifiers, medical diagnostics, and more.