Does Siksha Sarovar have an AI chatbot to answer student doubts?

Yes. Siksha Sarovar has a built-in AI Assistant chatbot accessible from a floating button on every page. It understands English, Hindi and Hinglish, handles typos (for example 'pyhtion' or 'certifecate'), and indexes 165+ destinations including every course, lesson, BCA subject, school chapter, competitive exam topic, FAQ and tool. Most queries return direct link cards in under 5 milliseconds. An AI fallback is available for novel questions.

Can I ask the SikshaSarovar chatbot questions in Hindi or Hinglish?

Absolutely. The chatbot is built specifically for Indian students — natural Hinglish queries like 'kaise milega certificate', 'free hai kya', 'pyhtion ke datatype kaha hai', 'kaha se shuru karu' are first-class citizens. The matcher strips Hindi filler words and routes you to the right course, lesson or page.

Is the SikshaSarovar AI chatbot free to use?

Yes. The chatbot is 100% free, requires no signup, and is available on every page. It runs locally in your browser for the vast majority of queries — there is no API cost or usage limit. The optional 'Ask AI' fallback for advanced coding questions uses the Pro AI Tutor.

Is Siksha Sarovar really free?

Yes. Every course, lesson, quiz, online compiler, and notes download is free to use without an account. We offer an optional Pro pass that unlocks longer AI tutor sessions, larger compiler quotas and priority support, but it is not required to learn from the platform. The educational content itself stays free.

Do I need to sign in to use the courses?

No. You can browse any course, read all lessons, run code in the compiler and take quizzes without signing in. Google Sign-In is purely optional and is used only to save your progress, quiz scores and certificate eligibility across devices. We never request access to Gmail, Drive, Calendar, Contacts, or any sensitive Google data.

Are the certificates from Siksha Sarovar recognised?

Our certificates are a record of completion that you can share on LinkedIn or attach to applications, but Siksha Sarovar is an independent platform — not a UGC-recognised university or board. We are upfront about that. The certificate is most useful as a verifiable signal that you have completed the curriculum, not as a substitute for a degree.

Which courses are best for BCA and MCA students?

Our University Curriculum section covers the YMCA BCA/MCA syllabus subject-by-subject — Data Structures, DBMS, Web Based Programming, Computer Networks, Operating Systems, Software Engineering, Data Warehousing and more. Each subject is broken down into the same units your university teaches, with previous year question papers where available.

Can I use Siksha Sarovar to prepare for SSC, UPSC, Banking or Railway exams?

Yes. The Competitive section has dedicated tracks for SSC (CGL, CHSL, MTS), UPSC, IBPS/SBI Banking, RRB Railways and defence exams (NDA, CDS, AFCAT). Topics include quantitative aptitude, reasoning, English grammar, general knowledge and current affairs, written specifically for the Indian exam pattern.

What languages does the online compiler support?

The Siksha Sarovar online compiler supports C, C++, Python, Java, PHP, JavaScript, C# and SQL. The compiler runs your code in a sandboxed environment using Judge0, returns the standard output and error stream, and supports stdin so you can test interactive programs. There is no installation — everything runs in your browser.

How is my personal data handled by Siksha Sarovar?

We follow data minimisation: we collect only what is needed (email, name, profile picture from Google sign-in, and your learning progress). Data is stored on Supabase with HTTPS in transit. We do not sell user data, and we do not use it to train AI models. You can request deletion at any time by emailing contact@sikshasarovar.com — see our Privacy Policy for the full details.

Who founded Siksha Sarovar?

Siksha Sarovar was founded by Rohit Kumar, who serves as CEO and Head Developer. Rohit built the platform to provide free, structured education to students across India — covering programming courses, university notes, school study material and competitive exam preparation.

Handling Missing Values — Data Science Notes

Handling Missing Values

Missing values are one of the most pervasive data quality issues in real-world datasets. How you handle them can significantly impact the accuracy and reliability of your analysis and models.

Formal Definition

A missing value (also called a null value, NaN, or NA) is a data point where no value is stored for the observation in a particular variable. In Python and Pandas, missing values are typically represented as NaN (Not a Number) or None.

---

Why Values Go Missing

Cause	Description	Example
Human Error	Manual data entry mistakes	Forgetting to fill a form field
System Failure	Sensor malfunction or software bugs	Temperature sensor offline for 2 hours
Survey Design	Respondent skips optional questions	"Prefer not to answer" on income
Data Merging	Joining datasets with mismatched keys	Left join creates NaN for unmatched rows
Privacy Restrictions	Sensitive data intentionally withheld	Medical records with redacted fields

---

Types of Missing Data

Understanding why data is missing is crucial for choosing the right strategy:

Type	Full Name	Description	Example
MCAR	Missing Completely at Random	Missingness has no relationship to any variable	Random sensor glitch
MAR	Missing at Random	Missingness depends on observed variables but not the missing value itself	Men less likely to report weight
MNAR	Missing Not at Random	Missingness depends on the missing value itself	High-income people refusing to report income

Why This Matters:

MCAR: Safe to delete rows â€” no bias introduced.
MAR: Imputation methods are appropriate.
MNAR: Most challenging â€” requires domain knowledge or specialized models.

---

Detecting Missing Values

import pandas as pd
import numpy as np

df = pd.read_csv("data.csv")

# Total missing values per column
print(df.isnull().sum())

# Percentage of missing values per column
print((df.isnull().sum() / len(df)) * 100)

# Heatmap visualization of missing values
import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(df.isnull(), cbar=True, yticklabels=False, cmap="viridis")
plt.title("Missing Value Heatmap")
plt.show()

---

Strategies for Handling Missing Values

1. Deletion Methods

a) Listwise Deletion (Dropping Rows)

Remove entire rows that contain any missing value.

# Drop all rows with any NaN
df_clean = df.dropna()

# Drop rows where specific columns have NaN
df_clean = df.dropna(subset=["age", "income"])

When to use: When the percentage of missing data is very small (< 5%) and data is MCAR.

b) Column Deletion (Dropping Columns)

Remove entire columns that have too many missing values.

# Drop columns with more than 50% missing values
threshold = 0.5
df_clean = df.loc[:, df.isnull().mean() < threshold]

When to use: When a column has >50% missing values and is not critical for analysis.

---

2. Imputation Methods (Filling Missing Values)

a) Mean/Median/Mode Imputation

# Fill with mean (for normally distributed numerical data)
df["age"].fillna(df["age"].mean(), inplace=True)

# Fill with median (for skewed numerical data â€” more robust)
df["income"].fillna(df["income"].median(), inplace=True)

# Fill with mode (for categorical data)
df["city"].fillna(df["city"].mode()[0], inplace=True)

b) Forward Fill and Backward Fill (for time series)

# Forward fill â€” carry last known value forward
df["temperature"].fillna(method="ffill", inplace=True)

# Backward fill â€” use next known value
df["temperature"].fillna(method="bfill", inplace=True)

c) Constant Value Imputation

# Fill with a constant value
df["status"].fillna("Unknown", inplace=True)
df["score"].fillna(0, inplace=True)

d) K-Nearest Neighbors (KNN) Imputation

Uses the values of similar records to predict missing values.

from sklearn.impute import KNNImputer

imputer = KNNImputer(n_neighbors=5)
df_imputed = pd.DataFrame(
    imputer.fit_transform(df.select_dtypes(include=[np.number])),
    columns=df.select_dtypes(include=[np.number]).columns
)

e) Iterative Imputation (MICE - Multiple Imputation by Chained Equations)

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

imputer = IterativeImputer(max_iter=10, random_state=42)
df_imputed = pd.DataFrame(
    imputer.fit_transform(df.select_dtypes(include=[np.number])),
    columns=df.select_dtypes(include=[np.number]).columns
)

---

Comparison of Imputation Strategies

Method	Best For	Pros	Cons
Deletion	Small % of missing, MCAR	Simple, no bias if MCAR	Loses data, biased if not MCAR
Mean	Normally distributed numeric data	Easy to implement	Distorts variance, sensitive to outliers
Median	Skewed numeric data	Robust to outliers	Ignores feature relationships
Mode	Categorical data	Simple, preserves categories	Over-represents dominant category
Forward/Backward Fill	Time-series data	Preserves temporal patterns	Not suitable for non-sequential data
KNN	Moderately missing data	Uses inter-feature relationships	Computationally expensive for large data
MICE	Complex multivariate data	Most sophisticated, handles MAR	Slow, complex to tune

---

Summary

Missing values arise from human error, system failures, survey design, and data merging.
Understanding the type of missingness (MCAR, MAR, MNAR) guides the correct approach.
Deletion is simple but can lose information; imputation preserves data but introduces assumptions.
Advanced methods like KNN and MICE use relationships between features for more accurate imputation.
Always validate the impact of your chosen strategy on downstream analysis.