Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

SciPy: Scientific Computing

Lesson 30 of 37 in the free Data Science notes on Siksha Sarovar, written by Rohit Jangra.

SciPy: Scientific & Statistical Computing

Definition: SciPy(Scientific Python) is an open - source library that builds on NumPy to provide additional functionality for scientific and technical computing.It includes modules for optimization, integration, interpolation, linear algebra, signal processing, and statistical testing .

import scipy

---

SciPy vs NumPy

FeatureNumPySciPy
FocusArray operationsScientific computing
StatisticsBasic (mean, std)Advanced (t-test, chi-square, ANOVA)
Linear AlgebraBasic operationsAdvanced (sparse matrices, eigenvalues)
OptimizationNot availableCurve fitting, minimization
IntegrationNot availableNumerical integration
Signal ProcessingFFT onlyFilters, convolutions
RelationshipFoundationBuilt on top of NumPy

---

Key SciPy Modules

ModuleImportPurpose
scipy.statsfrom scipy import statsStatistical functions & tests
scipy.optimizefrom scipy import optimizeOptimization & curve fitting
scipy.linalgfrom scipy import linalgLinear algebra (beyond NumPy)
scipy.integratefrom scipy import integrateNumerical integration
scipy.interpolatefrom scipy import interpolateInterpolation
scipy.signalfrom scipy import signalSignal processing
scipy.sparsefrom scipy import sparseSparse matrices

---

scipy.stats — Statistical Functions

Descriptive Statistics

from scipy import stats
import numpy as np

data = [23, 45, 12, 67, 34, 89, 56, 78, 45, 34]

print("Mean:", np.mean(data))
print("Median:", np.median(data))
print("Mode:", stats.mode(data))
print("Skewness:", stats.skew(data))
print("Kurtosis:", stats.kurtosis(data))
FunctionDescription
stats.describe(data)Complete statistical summary
stats.skew(data)Measure of asymmetry
stats.kurtosis(data)Measure of tail heaviness
stats.zscore(data)Z-scores for outlier detection
stats.mode(data)Most frequent value
stats.sem(data)Standard error of the mean

---

Hypothesis Testing with SciPy

1. t-Test (Compare Means)

One-Sample t-Test: Is the sample mean different from a known value?

t_stat, p_value = stats.ttest_1samp(data, popmean=50)
print(f"t-statistic: {t_stat:.3f}, p-value: {p_value:.3f}")

Two-Sample t-Test: Are two groups significantly different?

group_a = [85, 90, 78, 92, 88]
group_b = [70, 75, 68, 80, 72]
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"p-value: {p_value:.4f}")
if p_value < 0.05:
    print("Significant difference!")

2. Chi-Square Test (Categorical Data)

Tests whether two categorical variables are independent.

observed = [[30, 10], [20, 40]]
chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print(f"Chi-Square: {chi2:.3f}, p-value: {p_value:.3f}")

3. ANOVA (Compare 3+ Groups)

Tests whether means of 3 or more groups are different.

group1 = [85, 90, 78, 92]
group2 = [70, 75, 68, 80]
group3 = [60, 65, 58, 72]
f_stat, p_value = stats.f_oneway(group1, group2, group3)
print(f"F-statistic: {f_stat:.3f}, p-value: {p_value:.4f}")

Statistical Tests Summary

TestFunctionUse Case
One-Sample t-Teststats.ttest_1samp()Compare sample mean to known value
Two-Sample t-Teststats.ttest_ind()Compare means of 2 independent groups
Paired t-Teststats.ttest_rel()Compare before/after measurements
Chi-Squarestats.chi2_contingency()Independence of categorical variables
ANOVAstats.f_oneway()Compare means of 3+ groups
Mann-Whitney Ustats.mannwhitneyu()Non-parametric alternative to t-test
Shapiro-Wilkstats.shapiro()Test for normality
Pearson Correlationstats.pearsonr()Linear correlation between two variables
Spearman Correlationstats.spearmanr()Monotonic correlation (non-linear)

---

Probability Distributions

SciPy provides access to 100+ probability distributions:

# Normal Distribution
from scipy.stats import norm
x = norm.rvs(loc=0, scale=1, size=1000)  # Generate random samples
pdf = norm.pdf(0)                          # Probability density at x=0
cdf = norm.cdf(1.96)                       # Cumulative probability up to 1.96
ppf = norm.ppf(0.975)                      # Inverse CDF (percentile)
MethodDescription
.rvs()Random samples
.pdf()Probability Density Function
.cdf()Cumulative Distribution Function
.ppf()Percent Point Function (inverse CDF)
.mean()Distribution mean
.std()Distribution standard deviation

---

scipy.optimize — Curve Fitting

from scipy.optimize import curve_fit

def model(x, a, b):
    return a * x + b

x_data = np.array([1, 2, 3, 4, 5])
y_data = np.array([2.2, 4.1, 5.8, 8.3, 9.9])

params, covariance = curve_fit(model, x_data, y_data)
print(f"a = {params[0]:.2f}, b = {params[1]:.2f}")

---

SciPy in Data Science

ApplicationSciPy ModuleHow It's Used
A/B Testingscipy.statst-tests to compare conversion rates
Feature Selectionscipy.statsChi-square tests for categorical features
Normality Testingscipy.statsShapiro-Wilk test before parametric tests
Optimizationscipy.optimizeMinimizing loss functions
Interpolationscipy.interpolateFilling gaps in time series data
Sparse Datascipy.sparseEfficient storage for text data (TF-IDF)

Summary

  • SciPy extends NumPy with advanced scientific computing functionality.
  • scipy.stats is the most important module for data scientists — it provides hypothesis testing, distributions, and correlations.
  • t-tests, chi-square, and ANOVA are essential for statistical analysis and A/B testing.
  • SciPy's probability distribution functions (pdf, cdf, ppf) are used for statistical modeling.
  • Optimization and curve fitting tools help in model development.