Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

3.2 Matplotlib Basics

Lesson 21 of 32 in the free Data Visualisation and Analytics notes on Siksha Sarovar, written by Rohit Jangra.

Data Visualization with Matplotlib: The Foundation

1. Introduction to Architecture

Matplotlib is the foundational library for visualization in Python.

Study Deep: Tufte's Data-Ink Ratio

Edward Tufte, a pioneer in data visualization, proposed the Data-Ink Ratio:

  • The Rule: Total ink used for data / Total ink used for the whole graphic.
  • The Goal: Maximize this ratio. Remove "chart junk" like heavy gridlines, 3D effects, and unnecessary borders. If an ink mark doesn't represent data, it shouldn't be there.

1. Introduction to Architecture

Matplotlib is the foundational library for visualization in Python. Understanding its architecture is key to mastering it.

  • Figure: The top-level container (The "Window" or "Page"). Created with plt.figure().
  • Axes: The area where data is plotted (The "Graph"). A Figure can have multiple Axes.
  • Axis: The number-line-like objects (x-axis, y-axis) with ticks and labels.
  • Artist: Everything you see (lines, text, rectangles, patches) is an Artist object.

Two Interfaces:

InterfaceDescriptionWhen to Use
pyplot (plt)Quick, MATLAB-style state-based interfaceQuick plots, single graphs, exploration
Object-Oriented (fig, ax)Explicit control over Figure and Axes objectsMulti-panel figures, publication-quality plots

2. Fundamental Plots

Plot TypeFunctionBest Use CaseData Type
Line Graphplt.plot(x, y)Trends over time (Time Series)Continuous
Bar Chartplt.bar(x, h)Comparing categorical quantitiesCategorical vs. Numerical
Scatter Plotplt.scatter(x, y)Relationships/CorrelationsContinuous vs. Continuous
Histogramplt.hist(x, bins)Distribution of a single variableContinuous
Pie Chartplt.pie(x)Composition of a whole. Use sparingly — bar charts are better.Proportions
Box Plotplt.boxplot(x)Distribution + OutliersContinuous (by group)

3. Chart Selection Decision Guide

Your QuestionBest ChartWhy
How does X change over time?Line ChartShows continuous trends
How do categories compare?Bar ChartEasy to compare heights
Is there a relationship between X and Y?Scatter PlotReveals correlation patterns
What is the distribution of X?Histogram / Box PlotShows shape, spread, outliers
What is the composition?Stacked Bar / PieShows parts of a whole
How do groups compare on distribution?Box Plot / ViolinShows median, quartiles, outliers

4. The Data-Ink Ratio (Edward Tufte's Principle)

Data-Ink Ratio = Data Ink / Total Ink Used in the Chart

  • Maximize this ratio by removing all non-essential elements.
  • Remove unnecessary borders, gridlines, and decorations.
  • Every drop of ink should represent data, not decoration.

5. Code Example: Anatomy of a Basic Plot

import matplotlib.pyplot as plt

# Data
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 50]

# 1. Create Figure and Axes
plt.figure(figsize=(10, 6)) # Width: 10 inches, Height: 6 inches

# 2. Plot Data
plt.plot(x, y, color='green', linestyle='--', marker='o')

# 3. Add Labels (The "Artist" layer)
plt.title("Growth Over Time")
plt.xlabel("Years")
plt.ylabel("Value ($)")
plt.grid(True) # Add gridlines

# 4. Show
plt.show()