Probability for Data Science
Probability is the mathematical framework for quantifying uncertainty. In data science, almost everything involves uncertainty — from predicting customer behavior to classifying images. Probability provides the language and tools to reason about uncertain events rigorously.
---
Fundamental Concepts
Definition of Probability: The probability of an event A, written P(A), is a number between 0 and 1 that represents the likelihood of the event occurring.
P(A) = 0→ Impossible eventP(A) = 1→ Certain event0 < P(A) < 1→ The event may or may not occur
Basic Formula: P(A) = Number of favorable outcomes / Total number of outcomes
Example: Probability of rolling a 4 on a fair die = 1/6 ≈ 0.167
---
Key Probability Rules
| Rule | Formula | Description | |
|---|---|---|---|
| Complement | P(A') = 1 - P(A) | Probability of A NOT happening | |
| Addition (OR) | P(A ∪ B) = P(A) + P(B) - P(A ∩ B) | Probability of A or B | |
| Multiplication (AND) | `P(A ∩ B) = P(A) × P(B | A)` | Probability of both A and B |
| Independence | P(A ∩ B) = P(A) × P(B) | When A and B don't affect each other |
---
Conditional Probability
Definition: The probability of event A occurring given that event B has already occurred.
P(A|B) = P(A ∩ B) / P(B)
Example:
In a class of 100 students, 40 are female. Of these, 10 have scored above 90%. P(Score > 90 | Female) = 10/40 = 0.25 (25%)
Why It Matters in Data Science:
- Spam filters calculate: P(Spam | "free money" in email)
- Medical diagnosis: P(Disease | Positive Test Result)
- Recommendation: P(User likes Movie B | User liked Movie A)
---
Random Variables
Definition: A random variable is a variable whose value is a numerical outcome of a random phenomenon. It assigns a number to each outcome in a sample space.
Types:
| Type | Description | Example |
|---|---|---|
| Discrete | Takes countable distinct values | Number of defective items in a batch (0, 1, 2, ...) |
| Continuous | Takes any value in a continuous range | Weight of a person (65.2 kg, 70.8 kg, ...) |
---
Probability Distributions
A probability distribution describes how the probabilities are distributed across the possible values of a random variable.
Key Discrete Distributions
1. Bernoulli Distribution:
- Models a single trial with two outcomes (Success/Failure).
P(X=1) = p,P(X=0) = 1-p- Example: A single coin flip (Heads = 1, Tails = 0).
2. Binomial Distribution:
- Models the number of successes in n independent Bernoulli trials.
- Parameters: n (number of trials), p (probability of success per trial).
- Example: Number of heads in 10 coin flips.
3. Poisson Distribution:
- Models the number of events occurring in a fixed interval of time/space when events occur independently at a constant rate.
- Parameter: λ (lambda) = average rate of events.
- Example: Number of customer arrivals at a store per hour.
Key Continuous Distributions
4. Normal (Gaussian) Distribution:
- The most important distribution in statistics — the "bell curve".
- Parameters: μ (mean, center), σ (standard deviation, spread).
- 68-95-99.7 Rule: 68% of data falls within 1σ, 95% within 2σ, 99.7% within 3σ of the mean.
- Example: Heights of people, IQ scores, measurement errors.
5. Uniform Distribution:
- Every outcome in the range is equally likely.
- Example: Rolling a fair die (each outcome has P = 1/6).
Distribution Summary Table
| Distribution | Type | Parameters | Example Use Case |
|---|---|---|---|
| Bernoulli | Discrete | p (success probability) | Email: Spam or Not Spam |
| Binomial | Discrete | n, p | Defective items in a batch |
| Poisson | Discrete | λ (rate) | Website visits per hour |
| Normal | Continuous | μ, σ | Height, weight, test scores |
| Uniform | Continuous | a, b (min, max) | Random number generation |
---
Bayes' Theorem
Bayes' Theorem is one of the most powerful and widely used results in probability. It allows us to update our beliefs about an event as new evidence becomes available.
Formula: P(A|B) = [P(B|A) × P(A)] / P(B)
Where:
P(A|B)= Posterior Probability — Updated belief about A after seeing B.P(B|A)= Likelihood — Probability of seeing B if A is true.P(A)= Prior Probability — Initial belief about A (before evidence).P(B)= Evidence — Total probability of observing B.
---
Bayes' Theorem — Worked Example (Medical Test)
A medical test for a rare disease has: Sensitivity (True Positive Rate): 99% — If you have the disease, the test correctly identifies it 99% of the time. Specificity (True Negative Rate): 95% — If you don't have the disease, the test correctly says negative 95% of the time. Disease Prevalence: 1 in 1000 people (0.1%). Question: If a person tests positive, what is the probability they actually have the disease? Solution using Bayes' Theorem: P(Disease) = 0.001 P(No Disease) = 0.999 P(Positive | Disease) = 0.99 P(Positive | No Disease) = 0.05 (False Positive Rate) P(Positive) = (0.99 × 0.001) + (0.05 × 0.999) = 0.00099 + 0.04995 = 0.05094 * P(Disease | Positive) = (0.99 × 0.001) / 0.05094 ≈ 0.0194 ≈ 1.94% Surprising Result! Even with a 99% accurate test, the probability of actually having the disease given a positive result is only about 2%. This is because the disease is so rare (low prior).
---
Applications of Bayes' Theorem in Data Science
| Application | How Bayes Is Used |
|---|---|
| Naive Bayes Classifier | One of the simplest and most effective text classification algorithms (spam detection) |
| Medical Diagnosis | Updating the probability of a disease given test results |
| Search Engines | Ranking pages based on the probability of relevance given a query |
| A/B Testing (Bayesian) | Updating the probability that variant B is better than A as more data comes in |
| Recommendation Systems | Updating user preference models with each interaction |
Summary
- Probability quantifies uncertainty on a scale of 0 to 1.
- Conditional probability is the foundation for many ML algorithms.
- Random variables can be discrete or continuous.
- Key distributions (Normal, Binomial, Poisson) model real-world phenomena.
- Bayes' Theorem lets us update beliefs with new evidence — it powers Naive Bayes classifiers, medical diagnostics, and more.