Seaborn: Statistical Data Visualization
Definition: Seaborn is a Python visualization library built on top of Matplotlib that provides a high-level interface for creating attractive, informative statistical graphics. It is designed to work seamlessly with Pandas DataFrames and makes complex plots easy with minimal code.
import seaborn as sns
---
Matplotlib vs Seaborn
| Feature | Matplotlib | Seaborn |
|---|---|---|
| Level | Low-level (more code) | High-level (less code) |
| Aesthetics | Basic (needs customization) | Beautiful by default |
| Statistical Integration | Manual | Built-in (regression, distributions) |
| DataFrame Support | Basic | Native (pass column names directly) |
| Plot Variety | General-purpose | Statistical-focused |
| Customization | Extremely flexible | Moderate (uses Matplotlib underneath) |
| Use Case | Custom plots, subplots | Quick EDA, statistical analysis |
---
Types of Seaborn Plots
Relational Plots (Relationships between variables)
| Plot | Function | Best For |
|---|---|---|
| Scatter Plot | sns.scatterplot(x, y, data=df) | Relationship between two continuous variables |
| Line Plot | sns.lineplot(x, y, data=df) | Trends over time |
| Relplot | sns.relplot(x, y, hue, data=df) | Relational plot with facets |
Distribution Plots
| Plot | Function | Best For |
|---|---|---|
| Histogram | sns.histplot(data, bins=30) | Distribution shape |
| KDE Plot | sns.kdeplot(data) | Smooth density estimate |
| Dist Plot | sns.displot(data, kde=True) | Combined histogram + KDE |
| Box Plot | sns.boxplot(x, y, data=df) | Distribution summary with outliers |
| Violin Plot | sns.violinplot(x, y, data=df) | Distribution shape + density |
Categorical Plots
| Plot | Function | Best For |
|---|---|---|
| Bar Plot | sns.barplot(x, y, data=df) | Mean of a variable per category |
| Count Plot | sns.countplot(x='col', data=df) | Count of observations per category |
| Strip Plot | sns.stripplot(x, y, data=df) | Individual data points by category |
| Swarm Plot | sns.swarmplot(x, y, data=df) | Non-overlapping strip plot |
Matrix Plots
| Plot | Function | Best For |
|---|---|---|
| Heatmap | sns.heatmap(corr_matrix, annot=True) | Correlation matrix visualization |
| Cluster Map | sns.clustermap(data) | Hierarchical clustering heatmap |
Regression Plots
| Plot | Function | Best For |
|---|---|---|
| Reg Plot | sns.regplot(x, y, data=df) | Scatter + regression line |
| LM Plot | sns.lmplot(x, y, hue, data=df) | Regression with faceting |
---
Key Seaborn Features
1. Hue (Color Grouping)
Add a third variable using color: sns.scatterplot(x='Age', y='Score', hue='Gender', data=df)
2. Built-in Themes
sns.set_theme(style="darkgrid") # darkgrid, whitegrid, dark, white, ticks
sns.set_palette("pastel") # Color palette
3. Color Palettes
| Palette | Type | Best For |
|---|---|---|
"deep" | Qualitative | Default, distinct categories |
"pastel" | Qualitative | Soft, presentation-friendly |
"coolwarm" | Diverging | Correlation heatmaps |
"viridis" | Sequential | Ordered data |
"Set2" | Qualitative | Colorblind-friendly |
---
Correlation Heatmap (Most Common in EDA)
corr = df.corr() # Compute correlation matrix
sns.heatmap(corr, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title("Feature Correlation Heatmap")
plt.show()
This is one of the most important visualizations in Exploratory Data Analysis. It shows which features are positively or negatively correlated.
---
Seaborn in the Data Science Workflow
| Stage | How Seaborn Helps |
|---|---|
| EDA | Quickly visualize distributions, correlations, outliers |
| Feature Selection | Heatmaps reveal correlated features |
| Model Evaluation | Plot predicted vs actual values |
| Presentation | Beautiful plots for non-technical stakeholders |
Summary
- Seaborn builds on Matplotlib to provide beautiful statistical visualizations.
- It integrates natively with Pandas DataFrames.
- Key plots: scatter, box, violin, heatmap, pairplot, and regression plots.
hueparameter adds a third categorical dimension using color.- Correlation heatmaps are essential for feature selection in machine learning.