Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

3.4 Seaborn: Interface & Distributions

Lesson 23 of 32 in the free Data Visualisation and Analytics notes on Siksha Sarovar, written by Rohit Jangra.

Study Deep: Kernel Density Estimation (KDE)

A Histogram is sensitive to "bin size"—change the bin width, and the shape changes.

  • The Solution: KDE. It smooths the data using a Gaussian kernel to create a continuous probability density curve.
  • The Benefit: It provides a much clearer view of the "multimodality" (multiple peaks) of your data, which might be hidden by histogram bins.

Seaborn: Statistical Data Visualization

1. The Seaborn Philosophy

Seaborn is built on top of Matplotlib but designed for statistical exploration.

  • Tidy Data: It loves Pandas DataFrames where each column is a variable and each row is an observation.
  • Less Code: Complex statistical aggregations (like error bars, smoothing) are done automatically.
  • Beautiful Defaults: Publication-quality aesthetics out of the box.

2. Matplotlib vs. Seaborn: When to Use Which?

FeatureMatplotlibSeaborn
LevelLow-level (full control)High-level (abstractions)
InputLists, arraysPandas DataFrames
DefaultsBasic (needs manual styling)Beautiful out of the box
Best ForCustom charts, pixel-perfect controlStatistical exploration, quick EDA
Statistical FeaturesNone built-inRegression lines, CI bands, KDE
Learning CurveSteeperGentler for common plots
RelationshipFoundationBuilt on top of Matplotlib

Rule of Thumb: Start with Seaborn for exploration. Switch to Matplotlib when you need pixel-perfect customization.

3. The "Big Three" Figure-Level Functions

Seaborn groups almost all plots into three high-level interfaces. Learning these gives you access to everything.

Interface FunctionPurposeUnderlying Plots (kind=...)
sns.relplot()Relationship between variablesscatter (Scatterplot), line (Lineplot)
sns.displot()Distribution of datahist (Histogram), kde (Density), ecdf (Cumulative)
sns.catplot()Categorical comparisonsstrip, swarm, box, violin, bar, count

4. Visualizing Distributions (displot)

Understanding the shape of your data is the first step in analytics.

A. Histogram (kind='hist'):

  • Bins data into buckets and counts them.
  • sns.displot(data=df, x="age", bins=20)
  • Key Parameter: bins — Too few bins oversimplify; too many fragment the pattern.

B. KDE Plot (kind='kde'):

  • Kernel Density Estimation: A smooth, continuous curve estimated from the data.
  • Great for seeing distribution "shapes" (bell curves, bimodal distributions) without the blockiness of histograms.
  • sns.displot(data=df, x="age", kind="kde", fill=True)

C. ECDF Plot (kind='ecdf'):

  • Empirical Cumulative Distribution Function: Shows the proportion of data at or below each value.
  • X-axis = data value, Y-axis = proportion (0 to 1).
  • Great for comparing distributions across groups.

D. Rug Plot (sns.rugplot):

  • Draws a tiny tick mark for every single data point. Often combined with KDE to show where the raw data actually sits.