Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

2.8 Non-Parametric: Chi-Square Test

Lesson 16 of 32 in the free Data Visualisation and Analytics notes on Siksha Sarovar, written by Rohit Jangra.

Chi-Square (χ²): Analyzing Categorical Data

1. Parametric vs. Non-Parametric

When data violates normal distribution assumptions, or when dealing with nominal/ordinal categorical data, we use Non-Parametric (distribution-free) tests. The Chi-Square test is the most prominent.

2. Chi-Square Test for Independence

Evaluates whether two categorical variables are associated or completely independent.

  • H₀: Variables A and B are independent (knowing A tells you nothing about B).
  • H₁: Variables A and B are dependent.
  • Data Structure: Contingency Table (Cross-tabulation).

Step-by-Step Calculation:

  1. Calculate Expected Frequencies (E) for each cell:
  2. E = (Row Total * Column Total) / Grand Total

  3. Apply the χ² Formula:
  4. χ² = Σ [ (Observed - Expected)² / Expected ]

  5. Determine Degrees of Freedom:
  6. df = (Rows - 1) * (Columns - 1)

  7. Compare against χ² Critical Value table.

3. Chi-Square Goodness of Fit Test

Evaluates how well an observed distribution matches an expected theoretical distribution.

  • Example: You have a random number generator in Python. You generate 600 numbers between 1 and 6.
  • Expected: Each number should appear exactly 100 times.
  • Observed: [1: 95, 2: 110, 3: 105, 4: 90, 5: 102, 6: 98]
  • You run the Goodness of Fit test. If χ² is very high (p < 0.05), your algorithm is biased and not truly random. df = k - 1 (where k = number of categories).

4. Yates' Continuity Correction

The Chi-Square distribution is continuous, but categorical counts are discrete. For a 2x2 contingency table, this discrepancy can inflate the χ² value, increasing Type I errors. Yates' Correction adjusts the formula by subtracting 0.5 from the absolute difference before squaring: χ²_yates = Σ [ (|O - E| - 0.5)² / E ]

5. Core Assumptions

  1. Data must be raw frequencies (counts), not percentages or ratios.
  2. Categories are mutually exclusive.
  3. No expected frequency E should be < 1, and no more than 20% of expected frequencies should be < 5. (If violated, use Fisher's Exact Test).

6. Complete Worked Example: Test for Independence

Problem: 200 students surveyed about their preferred programming language (Python/Java) and their background (Science/Arts). Test at α = 0.05 whether preference is independent of background.

Observed Frequencies (O):

PythonJavaRow Total
Science7030100
Arts5050100
Col Total12080200

Step 1: Expected Frequencies (E):

  • E(Science, Python) = (100 × 120) / 200 = 60
  • E(Science, Java) = (100 × 80) / 200 = 40
  • E(Arts, Python) = (100 × 120) / 200 = 60
  • E(Arts, Java) = (100 × 80) / 200 = 40

Step 2: Chi-Square Statistic: χ² = (70-60)²/60 + (30-40)²/40 + (50-60)²/60 + (50-40)²/40 χ² = 100/60 + 100/40 + 100/60 + 100/40 χ² = 1.67 + 2.50 + 1.67 + 2.50 = 8.33

Step 3: Degrees of Freedom: df = (2-1) × (2-1) = 1

Step 4: Critical Value: χ²₀.₀₅,₁ = 3.841

Step 5: Conclusion: χ² = 8.33 > 3.841 → Reject H₀ Conclusion: There is a significant association between student background and programming language preference.

7. Quick Reference: Chi-Square Critical Values (α = 0.05)

dfχ² Critical Value
13.841
25.991
37.815
49.488
511.070