Matplotlib: Data Visualization
Definition: Matplotlib is the most widely used library for creating static, animated, and interactive visualizations in Python. It provides fine-grained control over every aspect of a plot — from colors and labels to axes and legends.
import matplotlib.pyplot as plt
---
Why Matplotlib?
| Feature | Benefit |
|---|---|
| Versatility | Create virtually any type of chart |
| Customization | Full control over every plot element |
| Integration | Works with NumPy, Pandas, Seaborn |
| Publication Quality | Produces plots suitable for research papers |
| Foundation | Seaborn and Pandas plotting are built on top of Matplotlib |
---
The Anatomy of a Matplotlib Plot
A Matplotlib plot consists of:
- Figure — The overall window/page.
- Axes — The actual plot area (a Figure can have multiple Axes).
- Title — The heading of the plot.
- Labels — X and Y axis labels.
- Legend — Identifies different data series.
- Ticks — The marks along the axes.
---
Types of Plots
| Plot Type | Function | Best For |
|---|---|---|
| Line Plot | plt.plot(x, y) | Trends over time (time series) |
| Bar Chart | plt.bar(x, y) | Comparing categories |
| Horizontal Bar | plt.barh(x, y) | Long category names |
| Histogram | plt.hist(data, bins=10) | Distribution of continuous data |
| Scatter Plot | plt.scatter(x, y) | Relationship between two variables |
| Pie Chart | plt.pie(sizes, labels=labels) | Proportions of a whole |
| Box Plot | plt.boxplot(data) | Distribution summary (median, quartiles, outliers) |
| Heatmap | plt.imshow(data) | Matrix/correlation visualization |
---
Creating Plots
Line Plot:
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 40]
plt.plot(x, y, color='blue', linestyle='--', marker='o')
plt.title("Sales Over Time")
plt.xlabel("Month")
plt.ylabel("Sales (₹)")
plt.grid(True)
plt.show()
Bar Chart:
categories = ['A', 'B', 'C', 'D']
values = [25, 40, 30, 55]
plt.bar(categories, values, color=['red', 'green', 'blue', 'orange'])
plt.title("Category Comparison")
plt.show()
Histogram:
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title("Distribution of Random Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
---
Customization Options
| Element | Code | Description |
|---|---|---|
| Title | plt.title("Title") | Plot title |
| X Label | plt.xlabel("X Axis") | X-axis label |
| Y Label | plt.ylabel("Y Axis") | Y-axis label |
| Legend | plt.legend() | Show legend |
| Grid | plt.grid(True) | Toggle grid |
| Line Color | color='red' | Change line color |
| Line Style | linestyle='--' | Dashed, dotted, etc. |
| Marker | marker='o' | Data point markers |
| Figure Size | plt.figure(figsize=(10, 6)) | Width × Height in inches |
| Save | plt.savefig("plot.png", dpi=300) | Save to file |
---
Subplots (Multiple Plots)
Create multiple plots in a single figure:
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].plot([1, 2, 3], [10, 20, 30])
axes[0].set_title("Line Plot")
axes[1].bar(['A', 'B', 'C'], [5, 8, 3])
axes[1].set_title("Bar Chart")
plt.tight_layout()
plt.show()
---
Matplotlib with Pandas
Pandas DataFrames have built-in plotting that uses Matplotlib:
df['Score'].plot(kind='hist', bins=10, title='Score Distribution')
df.plot(x='Name', y='Score', kind='bar')
df.plot.scatter(x='Age', y='Score')
---
When to Use Matplotlib
| Scenario | Use Matplotlib? |
|---|---|
| Quick exploratory plots | ✅ Yes |
| Publication-quality static plots | ✅ Yes |
| Full customization needed | ✅ Yes |
| Beautiful statistical plots | Use Seaborn (built on Matplotlib) |
| Interactive dashboards | Use Plotly or Dash |
Summary
- Matplotlib is the foundational visualization library in Python.
plt.plot(),plt.bar(),plt.hist(), andplt.scatter()are the most common plot types.- Every element of a plot (title, labels, colors, markers, grid) can be customized.
- Subplots allow multiple plots in one figure.
- Pandas integrates with Matplotlib for quick DataFrame plotting.