Aim
To produce four seaborn statistical plots — line plot, distribution plot (histogram + KDE), regression plot (lmplot) and count plot — from one study-habits dataset, and to learn which analytical question each plot family answers.
CO Mapping: CO1, CO2, CO3
Theory
Seaborn is a statistical layer over matplotlib: you hand it a tidy DataFrame (one row per observation, one column per variable), name the columns of interest, and it handles aggregation, styling and even model fitting. The four plots span the main families:
lineplot(relational): how one variable moves along an ordered axis.histplotwithkde=True(distribution): the shape of a single variable — centre, spread, skew, modality. Bin width and KDE bandwidth are smoothing choices, not facts.lmplot(regression): a scatter plus an ordinary-least-squares fitted line with a translucent 95% confidence band — an inferential statement drawn as ink.countplot(categorical): frequencies of category levels — a bar chart seaborn counts for you.
The key habit: choose the plot from the question (trend? shape? relationship? composition?), never from familiarity.
Dataset
Generated in the snippet with np.random.seed(7) — 15 students:
| Column | Meaning | How generated |
|---|---|---|
| Hours | Study hours (1–15) | np.arange(1, 16) |
| Score | Test score | random integers 45–94, independent of Hours |
| Department | BCA / BBA / BCom | random choice |
Note the deliberate trap: Score is pure noise with respect to Hours.
Procedure
- Seed NumPy (reproducibility), build
df, and printdf.head(). - Apply
sns.set_style("whitegrid")for a light statistical look. sns.lineplot(x="Hours", y="Score", data=df, marker="o")→ saveseaborn_lineplot.png.sns.histplot(df["Score"], kde=True, bins=7)→ saveseaborn_distplot.png.lm = sns.lmplot(x="Hours", y="Score", data=df)— lmplot is figure-level, hence the save goes throughlm.fig.savefig.sns.countplot(x="Department", data=df)→ saveseaborn_countplot.png.
Interpretation of Results
The line plot zig-zags with no persistent direction; the lmplot's fitted line is nearly flat and its confidence band is wide — the honest reading is "no evidence that more hours mean higher scores in this data", which is exactly right because Score was generated independently of Hours. The histogram + KDE shows scores spread across 45–95 without a strong peak, and the countplot reveals unequal department counts arising purely from sampling chance at n = 15. Learning to read the absence of signal — flat fits, wide bands, ragged small-sample histograms — is as important as spotting patterns, and it protects you from presenting noise as insight.
Common Mistakes
- Using the deprecated
sns.distplot— modern seaborn replaces it withhistplot/displot. - Treating
lmplotlike an axes-level function (it owns its whole figure; usehue/colfor comparisons instead of subplots). - Over-interpreting KDE bumps with only 15 observations — smoothing invents wiggles.
🎯 Viva Questions
- What is tidy data? One row per observation, one column per variable — the format seaborn expects.
- What does the band around the lmplot line mean? The 95% confidence interval for the fitted regression line.
- Histogram vs KDE? The histogram counts observations in bins; the KDE is a smoothed continuous density estimate.
- Figure-level vs axes-level functions? lmplot/displot create their own figure (FacetGrid); lineplot/histplot draw onto an existing axes.
- Why seed the random generator? Reproducibility — every run and every student's plots match.
- Which plot shows category frequencies?
countplot(equivalentlyvalue_counts().plot.bar()).