Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Practical 6: Seaborn Line, Dist, Lm, and Count Plot

Lesson 6 of 15 in the free Data Visualisation and Analytics Lab notes on Siksha Sarovar, written by Rohit Jangra.

Aim

To produce four seaborn statistical plots — line plot, distribution plot (histogram + KDE), regression plot (lmplot) and count plot — from one study-habits dataset, and to learn which analytical question each plot family answers.

CO Mapping: CO1, CO2, CO3

Theory

Seaborn is a statistical layer over matplotlib: you hand it a tidy DataFrame (one row per observation, one column per variable), name the columns of interest, and it handles aggregation, styling and even model fitting. The four plots span the main families:

  • lineplot (relational): how one variable moves along an ordered axis.
  • histplot with kde=True (distribution): the shape of a single variable — centre, spread, skew, modality. Bin width and KDE bandwidth are smoothing choices, not facts.
  • lmplot (regression): a scatter plus an ordinary-least-squares fitted line with a translucent 95% confidence band — an inferential statement drawn as ink.
  • countplot (categorical): frequencies of category levels — a bar chart seaborn counts for you.

The key habit: choose the plot from the question (trend? shape? relationship? composition?), never from familiarity.

Dataset

Generated in the snippet with np.random.seed(7) — 15 students:

ColumnMeaningHow generated
HoursStudy hours (1–15)np.arange(1, 16)
ScoreTest scorerandom integers 45–94, independent of Hours
DepartmentBCA / BBA / BComrandom choice

Note the deliberate trap: Score is pure noise with respect to Hours.

Procedure

  1. Seed NumPy (reproducibility), build df, and print df.head().
  2. Apply sns.set_style("whitegrid") for a light statistical look.
  3. sns.lineplot(x="Hours", y="Score", data=df, marker="o") → save seaborn_lineplot.png.
  4. sns.histplot(df["Score"], kde=True, bins=7) → save seaborn_distplot.png.
  5. lm = sns.lmplot(x="Hours", y="Score", data=df) — lmplot is figure-level, hence the save goes through lm.fig.savefig.
  6. sns.countplot(x="Department", data=df) → save seaborn_countplot.png.

Interpretation of Results

The line plot zig-zags with no persistent direction; the lmplot's fitted line is nearly flat and its confidence band is wide — the honest reading is "no evidence that more hours mean higher scores in this data", which is exactly right because Score was generated independently of Hours. The histogram + KDE shows scores spread across 45–95 without a strong peak, and the countplot reveals unequal department counts arising purely from sampling chance at n = 15. Learning to read the absence of signal — flat fits, wide bands, ragged small-sample histograms — is as important as spotting patterns, and it protects you from presenting noise as insight.

Common Mistakes

  1. Using the deprecated sns.distplot — modern seaborn replaces it with histplot / displot.
  2. Treating lmplot like an axes-level function (it owns its whole figure; use hue/col for comparisons instead of subplots).
  3. Over-interpreting KDE bumps with only 15 observations — smoothing invents wiggles.

🎯 Viva Questions

  1. What is tidy data? One row per observation, one column per variable — the format seaborn expects.
  2. What does the band around the lmplot line mean? The 95% confidence interval for the fitted regression line.
  3. Histogram vs KDE? The histogram counts observations in bins; the KDE is a smoothed continuous density estimate.
  4. Figure-level vs axes-level functions? lmplot/displot create their own figure (FacetGrid); lineplot/histplot draw onto an existing axes.
  5. Why seed the random generator? Reproducibility — every run and every student's plots match.
  6. Which plot shows category frequencies? countplot (equivalently value_counts().plot.bar()).