Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Practical 13: Covariance Implementation

Lesson 13 of 15 in the free Data Visualisation and Analytics Lab notes on Siksha Sarovar, written by Rohit Jangra.

Aim

To implement covariance between two variables X and Y using np.cov() with ddof=1 (sample covariance), display the full 2 × 2 covariance matrix, and extract the covariance value from its off-diagonal element.

CO Mapping: CO1, CO2, CO5

Theory

Covariance measures how two variables move together. For samples it is defined as

cov(X, Y) = Σ (xᵢ − x̄)(yᵢ − ȳ) / (n − 1)

Each term pairs X's deviation from its mean with Y's deviation from its mean. When both deviations share a sign (both above or both below their means) the product is positive; opposite signs give negative products. So the sign of covariance is fully interpretable: positive → the variables rise together, negative → one rises as the other falls, near zero → no linear co-movement.

The magnitude, however, is nearly uninterpretable, and that is the key limitation this practical exposes. Covariance carries the product of the units of X and Y (cm × kg, marks²…), so its size changes if you merely rescale a variable — measure X in metres instead of centimetres and the covariance shrinks 100-fold while the relationship itself is unchanged. There is no fixed "large" or "small". This is exactly the defect the Pearson correlation of Practical 12 repairs: dividing by σ_X σ_Y cancels the units and pins the result into [−1, 1].

The divisor n − 1 (ddof=1) is Bessel's correction: sample deviations are measured from the sample mean, which is itself fitted to the data, consuming one degree of freedom; dividing by n would systematically underestimate the population covariance. Finally, np.cov(x, y) returns a 2 × 2 matrix, not a single number: the diagonal holds Var(X) and Var(Y) (a variable's covariance with itself is its variance), and the two off-diagonal cells both hold cov(X, Y) — the matrix is symmetric.

Dataset

IndexXY
01222
11525
21828
32131
42435
52736

Means: x̄ = 117 / 6 = 19.5, ȳ = 177 / 6 = 29.5.

Procedure

  1. Define x and y as float NumPy arrays of 6 values each and wrap them in the DataFrame df for a clean printout.
  2. Print the data table.
  3. Compute cov_matrix = np.cov(df["X"], df["Y"], ddof=1) — a 2 × 2 sample covariance matrix.
  4. Print the matrix and identify its parts: [0, 0] is Var(X), [1, 1] is Var(Y), and [0, 1] = [1, 0] is cov(X, Y).
  5. Extract cov_xy = cov_matrix[0, 1] and print it rounded to 4 decimals.

Interpretation of Results

Working the formula by hand: the X deviations from 19.5 are (−7.5, −4.5, −1.5, 1.5, 4.5, 7.5) and the Y deviations from 29.5 are (−7.5, −4.5, −1.5, 1.5, 5.5, 6.5). Every pair shares its sign, so all six products are positive: 56.25 + 20.25 + 2.25 + 2.25 + 24.75 + 48.75 = 154.5, and cov(X, Y) = 154.5 / 5 = 30.9 — matching the program's output. The full printed matrix is [[31.5, 30.9], [30.9, 30.7]]: Var(X) = 31.5 and Var(Y) = 30.7 on the diagonal. The positive 30.9 confirms X and Y climb together — visible in the raw data, where Y increases in near-lockstep as X steps up by 3. But is 30.9 "strong"? Covariance alone cannot say; normalising gives r = 30.9 / √(31.5 × 30.7) ≈ 0.9937, revealing an almost perfectly linear relationship. The pair of numbers — cov = 30.9, r = 0.99 — is the whole lesson: covariance detects the direction, correlation quantifies the strength.

Common Mistakes

  1. Reporting the whole matrix (or the diagonal variance 31.5) as "the covariance" — cov(X, Y) is specifically the off-diagonal element cov_matrix[0, 1].
  2. Judging relationship strength from covariance magnitude — 30.9 is unit-dependent and would change under any rescaling; use correlation for strength.
  3. Using ddof=0 (population divisor n) for sample data — it biases the estimate low; sample statistics need Bessel's n − 1.

🎯 Viva Questions

  1. What does the sign of covariance tell you? Positive: variables move together; negative: they move oppositely; near zero: no linear co-movement.
  2. Why is covariance magnitude hard to interpret? It carries the product of the two variables' units and changes under rescaling, so there is no universal "large" value.
  3. What lies on the diagonal of np.cov(x, y)? The variances — Var(X) = 31.5 and Var(Y) = 30.7 here; a variable's covariance with itself is its variance.
  4. What is Bessel's correction and why divide by n − 1? Deviations are taken from the sample mean, which uses up one degree of freedom; dividing by n − 1 removes the resulting downward bias.
  5. How do you get correlation from this matrix? r = cov(X, Y) / √(Var(X) · Var(Y)) = 30.9 / √(31.5 × 30.7) ≈ 0.9937.
  6. Why is the covariance matrix symmetric? Because cov(X, Y) = cov(Y, X) — the deviation products are the same regardless of order.