Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Practical 11: One-Way ANOVA Implementation

Lesson 11 of 15 in the free Data Visualisation and Analytics Lab notes on Siksha Sarovar, written by Rohit Jangra.

Aim

To implement one-way ANOVA (Analysis of Variance) manually with NumPy — computing the sums of squares (SS), degrees of freedom, mean squares (MS) and the F-statistic for three groups of marks — and to present the result as a standard ANOVA table.

CO Mapping: CO1, CO2, CO3, CO5

Theory

One-way ANOVA tests whether three or more group means are equal. The hypotheses are:

  • H₀: μ_A = μ_B = μ_C (all group means equal);
  • H₁: at least one group mean differs.

The core idea is to split the total variability of all observations around the grand mean into two independent sources:

  • Between-group variability (SS_between): how far each group's mean sits from the grand mean, weighted by group size — the part of the spread explained by group membership.
  • Within-group variability (SS_within): how much observations scatter around their own group mean — pure noise that group membership cannot explain.

The identity SS_total = SS_between + SS_within always holds. Each SS is divided by its degrees of freedom (df_between = k − 1, df_within = n − k) to give mean squares, and the test statistic is their ratio:

F = MS_between / MS_within

Under H₀ both mean squares estimate the same error variance, so F ≈ 1. If the groups genuinely differ, MS_between inflates and F grows far beyond 1. The computed F is compared against the critical value of the F-distribution with (k − 1, n − k) degrees of freedom — here (2, 12), whose 5% critical value is about 3.89. Why not just run three t-tests? Each pairwise t-test carries its own 5% false-alarm risk, and the risks compound; ANOVA asks the question once, with one controlled error rate.

Dataset

Marks of three groups of 5 students each (k = 3, n = 15):

GroupValuesMean
A72, 75, 78, 71, 7474.0
B81, 85, 79, 84, 8382.4
C66, 69, 70, 68, 6768.0

Grand mean = 1122 / 15 = 74.8.

Procedure

  1. Define group_a, group_b, group_c as NumPy arrays and collect them in groups; build all_values with np.concatenate.
  2. Compute grand_mean = all_values.mean() (74.8), plus k = 3 and n = 15.
  3. Compute ss_between as Σ nᵢ (x̄ᵢ − grand mean)² over the groups, and ss_within as the sum of each group's squared deviations from its own mean.
  4. Compute ss_total directly from all 15 values and verify it equals ss_between + ss_within.
  5. Divide by the degrees of freedom (df_between = 2, df_within = 12) to get ms_between and ms_within, then f_value = ms_between / ms_within.
  6. Assemble anova_table as a DataFrame with Source, SS, df, MS and F columns and print it rounded to 4 decimals.

Interpretation of Results

Tracing the arithmetic: SS_between = 5(74.0 − 74.8)² + 5(82.4 − 74.8)² + 5(68.0 − 74.8)² = 3.2 + 288.8 + 231.2 = 523.2; SS_within = 30 + 23.2 + 10 = 63.2; SS_total = 586.4 (the identity checks out). Then MS_between = 523.2 / 2 = 261.6, MS_within = 63.2 / 12 ≈ 5.2667, and F ≈ 49.6709. That is more than twelve times the 5% critical value F(2,12) ≈ 3.89, so H₀ is emphatically rejected: the three group means (74.0, 82.4, 68.0) are not chance fluctuations around a common mean. The table itself tells the story — group membership explains 523.2 of the 586.4 total sum of squares (about 89%), while within-group noise is small and remarkably uniform (each group's marks stay within a few points of its own mean). Note ANOVA only says some difference exists; identifying which pairs differ needs a post-hoc test such as Tukey's HSD.

Common Mistakes

  1. Forgetting to weight SS_between by group size len(g) — with unequal groups the unweighted version is simply wrong.
  2. Mixing up the degrees of freedom (using n − 1 for within, or k for between) — the F ratio then references the wrong distribution.
  3. Concluding from a large F that every group differs from every other — ANOVA is an omnibus test; pairwise conclusions need post-hoc analysis.

🎯 Viva Questions

  1. What are H₀ and H₁ in one-way ANOVA? H₀: all group means are equal; H₁: at least one differs.
  2. What does the F-statistic measure? The ratio of between-group variance to within-group variance — how much more the groups differ from each other than their members do internally.
  3. Why is F ≈ 1 expected under H₀? Both MS_between and MS_within then estimate the same underlying error variance.
  4. What are the degrees of freedom here? Between: k − 1 = 2; Within: n − k = 12; Total: n − 1 = 14.
  5. What identity links the sums of squares? SS_total = SS_between + SS_within (586.4 = 523.2 + 63.2 in this data).
  6. Why use ANOVA instead of multiple t-tests? Repeated t-tests inflate the overall Type-I error rate; ANOVA tests all means at once with a single controlled α.