NumPy (Numerical Python)
Definition: NumPy is the foundational library for numerical computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a vast collection of high-level mathematical functions to operate on these arrays efficiently.
import numpy as np
---
Why NumPy?
| Feature | Python Lists | NumPy Arrays |
|---|---|---|
| Speed | Slow (interpreted loops) | Fast (C-based, vectorized) |
| Memory | More memory per element | Compact storage |
| Operations | Element-by-element loops needed | Vectorized (whole-array operations) |
| Data Types | Mixed types allowed | Homogeneous (single type) |
| Broadcasting | Not supported | Supported |
NumPy arrays are up to 50x faster than Python lists for numerical operations.
---
Creating Arrays
| Method | Code | Result |
|---|---|---|
| From list | np.array([1, 2, 3]) | [1 2 3] |
| Zeros | np.zeros((2, 3)) | 2×3 matrix of zeros |
| Ones | np.ones((3, 3)) | 3×3 matrix of ones |
| Range | np.arange(0, 10, 2) | [0 2 4 6 8] |
| Linspace | np.linspace(0, 1, 5) | [0 0.25 0.5 0.75 1] |
| Random | np.random.rand(3, 3) | 3×3 random values (0 to 1) |
| Identity | np.eye(3) | 3×3 identity matrix |
| Full | np.full((2, 2), 7) | 2×2 matrix filled with 7 |
---
Array Properties
| Property | Code | Description |
|---|---|---|
shape | arr.shape | Dimensions (e.g., (3, 4)) |
ndim | arr.ndim | Number of dimensions |
size | arr.size | Total number of elements |
dtype | arr.dtype | Data type (int64, float64) |
itemsize | arr.itemsize | Bytes per element |
---
Array Operations (Vectorized)
NumPy performs operations on entire arrays without loops — this is called vectorization.
| Operation | Code | Result |
|---|---|---|
| Addition | a + b | Element-wise addition |
| Multiplication | a * b | Element-wise multiplication |
| Scalar | a * 3 | Multiply all elements by 3 |
| Square | a ** 2 | Square each element |
| Square Root | np.sqrt(a) | Square root of each element |
| Sum | np.sum(a) or a.sum() | Sum of all elements |
| Mean | np.mean(a) | Average |
| Std Dev | np.std(a) | Standard deviation |
| Min/Max | a.min(), a.max() | Minimum, Maximum |
| Dot Product | np.dot(a, b) | Matrix multiplication |
---
Indexing and Slicing
arr = np.array([10, 20, 30, 40, 50])
arr[0] # 10 (first element)
arr[-1] # 50 (last element)
arr[1:4] # [20 30 40] (slice)
arr[::2] # [10 30 50] (every 2nd element)
2D Array Indexing:
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix[0, 0] # 1 (row 0, col 0)
matrix[1, :] # [4 5 6] (entire row 1)
matrix[:, 2] # [3 6 9] (entire column 2)
matrix[0:2, 1:] # [[2 3], [5 6]] (submatrix)
---
Reshaping Arrays
| Method | Code | Description |
|---|---|---|
| Reshape | arr.reshape(3, 4) | Change shape without changing data |
| Flatten | arr.flatten() | Convert to 1D array |
| Transpose | arr.T | Swap rows and columns |
---
Broadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes by automatically expanding the smaller array.
a = np.array([[1, 2, 3], [4, 5, 6]]) # Shape: (2, 3)
b = np.array([10, 20, 30]) # Shape: (3,)
result = a + b
# [[11 22 33], [14 25 36]] — b is broadcast across each row
---
NumPy in Data Science
| Application | How NumPy is Used |
|---|---|
| Linear Algebra | Matrix operations, eigenvalues, SVD |
| Image Processing | Images as pixel arrays (H × W × 3) |
| Machine Learning | Feature matrices, weight updates |
| Statistical Analysis | Mean, median, variance, correlations |
| Signal Processing | Fourier transforms (np.fft) |
Summary
- NumPy is the foundation of numerical computing in Python.
- Arrays are faster and more memory-efficient than Python lists.
- Vectorized operations eliminate the need for explicit loops.
- Broadcasting allows operations on arrays of different shapes.
- NumPy underpins Pandas, Scikit-learn, TensorFlow, and virtually every DS library.