Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

3.5 Seaborn: Categorical & Styling

Lesson 24 of 32 in the free Data Visualisation and Analytics notes on Siksha Sarovar, written by Rohit Jangra.

Seaborn: Categorical Data & Aesthetics

1. Visualizing Categorical Data (catplot)

When one variable is a category (e.g., "Day of Week") and the other is numerical (e.g., "Total Bill").

Plot TypeDescriptionWhen to UseStrength
Bar PlotShows the Mean with confidence intervalComparing averages between groupsEasy to interpret
Count PlotShows the Count of observationsChecking sample sizesSimple frequency check
Box PlotShows Median, Quartiles, and OutliersRobust comparison of distributionsShows spread + outliers
Violin PlotCombines Box Plot + KDESeeing distribution shape inside categoriesShows density + quartiles
Swarm PlotShows every single data point, no overlapSmall datasets, individual itemsNo information hidden
Strip PlotLike Swarm but allows overlap (jittered)Quick overview of individual valuesFast to render

Study Deep: Box Plot vs. Violin Plot

While both show the distribution of data across categories:

  • Box Plot: Focuses on the "5-number summary" (Min, Q1, Median, Q3, Max) and identifies outliers. It's clean and efficient for comparing many categories at once.
  • Violin Plot: Adds a Kernel Density Estimation (KDE) to the box plot. This allows you to see the "shape" or "density" of the data. If your data is bimodal (has two peaks), a box plot will hide this, but a violin plot will reveal it clearly.

Code Example:

# Box plot of Bill Amount by Day, split by Smoker status
sns.catplot(data=tips, x="day", y="total_bill", hue="smoker", kind="box")

2. Categorical Plot Selection Guide

Your GoalBest PlotWhy
Compare means across groupsBar PlotClear height comparison
See full distribution per groupBox Plot / ViolinShows quartiles, outliers, shape
See every individual data pointSwarm / StripNothing hidden
Count occurrences per categoryCount PlotSimple frequency
Compare means + see distributionViolin + overlay StripCombines both views

3. Visualizing Relationships: Lmplot

sns.lmplot() is a powerhouse. It draws a scatter plot AND fits a linear regression line with a 95% confidence interval shaded.

  • sns.lmplot(x="age", y="wage", data=df)
  • Interpretation: If the shaded area is narrow, the correlation estimate is precise.
  • Use hue="category" to split by groups and compare regression lines.

4. Heatmaps

Ideal for correlation matrices, confusion matrices, and any 2D matrix data:

sns.heatmap(df.corr(), annot=True, cmap='coolwarm', center=0)
# annot=True: Show numbers in cells
# cmap: Color palette
# center=0: Center colorbar at 0 (for diverging data)

5. Aesthetics: Color Palettes

Seaborn's color handling is superior for communicating patterns.

Palette TypeFunctionUse CaseExamples
Qualitativepalette="deep"Distinct categories (Apple, Banana, Orange)deep, pastel, bright, Set2
Sequentialpalette="viridis"Low to High values (Income, Temperature)viridis, rocket, Blues, YlOrRd
Divergingpalette="vlag"Centered on zero (Profit/Loss, Vote shift)vlag, coolwarm, icefire, RdBu

Choosing a Palette:

  • Categorical data → Qualitative (distinct, unrelated colors)
  • Ordered data → Sequential (light-to-dark gradient)
  • Data centered on a midpoint → Diverging (two contrasting colors meeting in the middle)

6. Styles and Contexts

  • Styles: Control background and grid aesthetics.
  • sns.set_style("whitegrid") (also darkgrid, white, dark, ticks).
  • Context: Scale elements for different output media.
  • sns.set_context("talk") — Large fonts/lines for presentations.
  • sns.set_context("paper") — Smaller fonts for printed reports.
  • sns.set_context("poster") — Very large for poster presentations.