Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

3. Exploratory Data Analysis (EDA)

Lesson 3 of 21 in the free Machine Learning notes on Siksha Sarovar, written by Rohit Jangra.

What is EDA?

Exploratory Data Analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. It is the critical "first look" at the data before any formal modeling begins. Goal: understand the data structure, detect outliers, identify patterns, and check assumptions.

Key Tools: Pandas & NumPy

  • Pandas: The powerhouse of data manipulation in Python. It provides the DataFrame structure, essentially a programmable Excel sheet.
  • NumPy: The fundamental package for scientific computing. It adds support for large, multi-dimensional arrays and matrices.

Core EDA Tasks

TaskDescriptionPandas Command (Example)
Shape AnalysisHow big is the data?df.shape
Data TypesWhat kind of data is in each column?df.info() or df.dtypes
Missing ValuesWhere are the gaps?df.isnull().sum()
Descriptive StatsMean, Median, Min, Max.df.describe()
CorrelationDo variables move together?df.corr()
Value CountsFrequency of categorical data.df['Category'].value_counts()

Why EDA Matters?

"Garbage In, Garbage Out". If you feed a model dirty, un-analyzed data, it will produce unreliable predictions. EDA ensures you are feeding your model quality information.