Does Siksha Sarovar have an AI chatbot to answer student doubts?

Yes. Siksha Sarovar has a built-in AI Assistant chatbot accessible from a floating button on every page. It understands English, Hindi and Hinglish, handles typos (for example 'pyhtion' or 'certifecate'), and indexes 165+ destinations including every course, lesson, BCA subject, school chapter, competitive exam topic, FAQ and tool. Most queries return direct link cards in under 5 milliseconds. An AI fallback is available for novel questions.

Can I ask the SikshaSarovar chatbot questions in Hindi or Hinglish?

Absolutely. The chatbot is built specifically for Indian students — natural Hinglish queries like 'kaise milega certificate', 'free hai kya', 'pyhtion ke datatype kaha hai', 'kaha se shuru karu' are first-class citizens. The matcher strips Hindi filler words and routes you to the right course, lesson or page.

Is the SikshaSarovar AI chatbot free to use?

Yes. The chatbot is 100% free, requires no signup, and is available on every page. It runs locally in your browser for the vast majority of queries — there is no API cost or usage limit. The optional 'Ask AI' fallback for advanced coding questions uses the Pro AI Tutor.

Is Siksha Sarovar really free?

Yes. Every course, lesson, quiz, online compiler, and notes download is free to use without an account. We offer an optional Pro pass that unlocks longer AI tutor sessions, larger compiler quotas and priority support, but it is not required to learn from the platform. The educational content itself stays free.

Do I need to sign in to use the courses?

No. You can browse any course, read all lessons, run code in the compiler and take quizzes without signing in. Google Sign-In is purely optional and is used only to save your progress, quiz scores and certificate eligibility across devices. We never request access to Gmail, Drive, Calendar, Contacts, or any sensitive Google data.

Are the certificates from Siksha Sarovar recognised?

Our certificates are a record of completion that you can share on LinkedIn or attach to applications, but Siksha Sarovar is an independent platform — not a UGC-recognised university or board. We are upfront about that. The certificate is most useful as a verifiable signal that you have completed the curriculum, not as a substitute for a degree.

Which courses are best for BCA and MCA students?

Our University Curriculum section covers the YMCA BCA/MCA syllabus subject-by-subject — Data Structures, DBMS, Web Based Programming, Computer Networks, Operating Systems, Software Engineering, Data Warehousing and more. Each subject is broken down into the same units your university teaches, with previous year question papers where available.

Can I use Siksha Sarovar to prepare for SSC, UPSC, Banking or Railway exams?

Yes. The Competitive section has dedicated tracks for SSC (CGL, CHSL, MTS), UPSC, IBPS/SBI Banking, RRB Railways and defence exams (NDA, CDS, AFCAT). Topics include quantitative aptitude, reasoning, English grammar, general knowledge and current affairs, written specifically for the Indian exam pattern.

What languages does the online compiler support?

The Siksha Sarovar online compiler supports C, C++, Python, Java, PHP, JavaScript, C# and SQL. The compiler runs your code in a sandboxed environment using Judge0, returns the standard output and error stream, and supports stdin so you can test interactive programs. There is no installation — everything runs in your browser.

How is my personal data handled by Siksha Sarovar?

We follow data minimisation: we collect only what is needed (email, name, profile picture from Google sign-in, and your learning progress). Data is stored on Supabase with HTTPS in transit. We do not sell user data, and we do not use it to train AI models. You can request deletion at any time by emailing contact@sikshasarovar.com — see our Privacy Policy for the full details.

Who founded Siksha Sarovar?

Siksha Sarovar was founded by Rohit Kumar, who serves as CEO and Head Developer. Rohit built the platform to provide free, structured education to students across India — covering programming courses, university notes, school study material and competitive exam preparation.

1.6 Data Quality & Outliers — Data Visualisation and Analytics Notes

Data Quality: Missing Values & Outliers

1. Missing Values

Handling missing data is critical because most statistical methods and machine learning algorithms cannot process incomplete inputs.

Study Deep: MCAR, MAR, and MNAR

Understanding why data is missing is more important than knowing how to fill it.

MCAR (Completely at Random): No pattern. (e.g., A sensor battery died). Solution: Delete or Mean Impute.
MAR (At Random): Pattern depends on other data. (e.g., Younger users skip the 'Annual Income' field). Solution: Impute using Age.
MNAR (Not at Random): Pattern depends on the missing value itself. (e.g., Users with very high income skip the field to protect privacy). Solution: Requires advanced domain modeling.

1. Missing Values

Type	Abbreviation	Definition	Example	Implication
Missing Completely at Random	MCAR	No pattern — missingness is unrelated to any variable	A respondent accidentally skipped a question	Safe to delete rows (no bias introduced)
Missing at Random	MAR	Missingness depends on observed variables, not the missing value itself	Women are less likely to report age, but this is related to gender (observed), not age itself	Use imputation based on observed data
Missing Not at Random	MNAR	Missingness depends on the missing value itself	High-income earners skip the "Salary" field because their salary is high	Most difficult — may need domain-specific models

Treatment Methods — Decision Guide:

Method	Description	When to Use	Pros	Cons
Listwise Deletion	Remove entire rows with any missing data	Missing data < 5%, MCAR	Simple, preserves relationships	Loses data, biased if not MCAR
Pairwise Deletion	Use available data for each specific analysis	Moderate missingness	Maximizes available data	Inconsistent sample sizes
Mean Imputation	Fill with column average	Normal numerical data, MCAR	Simple, fast	Reduces variance, distorts distribution
Median Imputation	Fill with column median	Skewed numerical data with outliers	Robust to outliers	Still distorts distribution
Mode Imputation	Fill with most frequent value	Categorical data	Works for non-numeric data	Can overrepresent one category
Forward/Backward Fill	Fill with next or previous value	Time-series data	Maintains temporal patterns	Can propagate errors
KNN Imputation	Use K-Nearest Neighbors to predict missing value	Complex patterns, sufficient data	Considers relationships between features	Computationally expensive
Multiple Imputation	Create multiple plausible values, average results	Research / clinical data	Most statistically rigorous	Complex to implement

2. Outliers

Formal Definition: An outlier is a data observation that lies at an abnormal distance from other values in the sample. Statistically, it is a point that falls outside the expected range of the data distribution.

Example: Salaries: [40k, 42k, 45k, 1M, 43k] -> 1M is an outlier.

Types of Outliers:

Point Outlier: A single data point far from the rest (e.g., Age = 200 in a human dataset).
Contextual Outlier: Abnormal in a specific context (e.g., 30°C is normal in summer, outlier in winter).
Collective Outlier: A subset of data points that is anomalous as a group (e.g., sudden traffic spike on a server).

Detection Methods:

Method	Formula / Rule	Assumption	Best For
Z-Score	`Z = (X - μ) / σ`; Outlier if	Z	> 3	Data follows Normal Distribution	Normally distributed data
IQR	`IQR = Q3 - Q1`; Outlier if X < Q1 - 1.5IQR or X > Q3 + 1.5IQR	None (non-parametric)	Skewed data, general-purpose
Modified Z-Score	Uses Median instead of Mean for robustness	None	Data with existing outliers
Isolation Forest	ML-based anomaly detection	None	High-dimensional data, complex patterns

Worked Example (IQR Method): Data: [10, 15, 18, 20, 22, 25, 100]

Q1 (25th percentile) = 15
Q3 (75th percentile) = 25
IQR = 25 - 15 = 10
Lower Bound = 15 - 1.5(10) = 0
Upper Bound = 25 + 1.5(10) = 40
100 > 40 → 100 is an outlier.

Treatment Decision Framework:

Action	When to Apply	Example
Remove	Data entry error or impossible value	Age = -5, Temperature = 999°C
Cap/Floor (Winsorization)	Replace extreme values with a threshold (e.g., 99th percentile)	Capping salaries at 99th percentile for fair comparison
Transform	Apply log or square root transformation to reduce impact	Log-transforming highly skewed income data
Keep	The outlier represents a genuine, meaningful rare event	Fraud detection, rare disease identification
Separate Analysis	Analyze outliers as a distinct group	VIP customers with abnormally high spending

3. Python Code: Handling Missing Values

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Age':    [25, 30, np.nan, 22, 35, np.nan],
    'Salary': [50000, 60000, 45000, np.nan, 70000, 55000],
    'Gender': ['M', 'F', 'M', np.nan, 'F', 'M']
})

# 1. Detect missing values
print(df.isnull().sum())           # Count per column
print(df.isnull().sum() / len(df)) # Percentage per column

# 2. Drop rows where > 50% values are missing
df_clean = df.dropna(thresh=len(df.columns) // 2)

# 3. Mean imputation (numeric columns)
df['Age'].fillna(df['Age'].mean(), inplace=True)

# 4. Median imputation (robust to outliers)
df['Salary'].fillna(df['Salary'].median(), inplace=True)

# 5. Mode imputation (categorical columns)
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)

4. Python Code: Outlier Detection with IQR

import pandas as pd

data = pd.Series([10, 15, 18, 20, 22, 25, 100])

# IQR Method
Q1 = data.quantile(0.25)  # 15
Q3 = data.quantile(0.75)  # 25
IQR = Q3 - Q1             # 10

lower = Q1 - 1.5 * IQR   # 0
upper = Q3 + 1.5 * IQR   # 40

outliers = data[(data < lower) | (data > upper)]
print("Outliers:", outliers.values)   # [100]

# Z-Score Method
from scipy import stats
z_scores = stats.zscore(data)
outliers_z = data[abs(z_scores) > 3]
print("Z-Score Outliers:", outliers_z.values)

# Capping / Winsorization (replace outliers at threshold)
data_capped = data.clip(lower=lower, upper=upper)
print("After Capping:", data_capped.values)  # 100 → 40

5. Exam-Ready Summary

Concept	Formula	Decision Rule
Z-Score Outlier	Z = (X-μ)/σ	Outlier if	Z	> 3
IQR Lower Bound	Q1 − 1.5×IQR	Outlier if X < Lower Bound
IQR Upper Bound	Q3 + 1.5×IQR	Outlier if X > Upper Bound
MCAR	No pattern to missingness	Safe to delete rows
MAR	Missingness depends on other columns	Impute using those columns
MNAR	Missingness depends on hidden value	Hardest — domain knowledge needed