Data Visualisation and Analytics — Free Notes & Tutorial
Free DVA (Data Visualization & Analytics) notes for BCA with PYQ papers at SikshaSarovar.
This Data Visualisation and Analytics course is part of Siksha Sarovar and is 100% free for students in India — no sign-up required to read. It contains 32 structured lessons with examples, and pairs with our free online compiler and AI tutor.
What you will learn
- Data visualization
- Analytics
- Charts
- Dashboards
Course content (32 lessons)
- 1.1 Unit 1 Overview — Unit 1: Overview of Data Visualisation and Analytics This unit introduces the fundamentals of data visualization, its importance, and various techniques to represent data…
- 1.2 Analytics Fundamentals — Analytics: Basic Nomenclature 1. What is Analytics? Analytics is the systematic, computational process of collecting, cleaning, analyzing, and interpreting data to discover useful…
- 1.3 Analytics Process Model — Analytics Process Model & Professional Roles 1. The Analytics Process Model The analytics process is a structured, iterative approach to solving problems using data. Study Deep:…
- 1.4 Analytical Models — Analytical Models: Requirements & Types 1. What is an Analytical Model? An analytical model is a mathematical or statistical representation of a real-world process. Study Deep:…
- 1.5 Data Collection & Sampling — Data Collection, Sampling & Distributions 1. Data Collection Sources Data collection is the foundation. The quality and type of data collected determines everything downstream.…
- 1.6 Data Quality & Outliers — Data Quality: Missing Values & Outliers 1. Missing Values Handling missing data is critical because most statistical methods and machine learning algorithms cannot process…
- 1.7 Standardization — Standardization (Feature Scaling) 1. Why Scale Data? Variables often have different units and magnitudes. Scaling puts all features on a level playing field. Study Deep: When…
- 1.8 Categorization & Segmentation — Categorization vs. Segmentation 1. Categorization Categorization is the process of assigning data points into predefined, manually specified groups based on explicit rules. Study…
- 2.1 Unit 2 Overview — Unit 2: Statistical Methods & Hypothesis Testing This unit covers the fundamental statistical techniques required for data analysis, tailored for BCA and computer science…
- 2.2 Probability Distributions in Depth — Probability Distributions: The Theory of Data Patterns 1. Mathematical Foundation A probability distribution is a mathematical function that describes the likelihood of obtaining…
- 2.3 Advanced Sampling Theory — Sampling Theory: Bridging Sample and Population 1. The Core Objective In data analytics, we rarely have access to the entire Population (N). Instead, we take a Sample (n) to…
- 2.4 Rigorous Hypothesis Testing — Hypothesis Testing: The Decision Framework 1. The Philosophical Framework Hypothesis testing operates like a criminal trial: "Innocent until proven guilty." Study Deep:…
- 2.5 Parametric Tests: Deep Dive — Parametric Tests: Z and T Distributions 1. What makes a test "Parametric"? Parametric tests assume that the underlying population follows a specific probability distribution…
- 2.6 The Mathematics of p-Values — p-Values: Evidence and Interpretation 1. Mathematical Definition of a p-Value The p-value is the exact probability of obtaining a test statistic at least as extreme as the one…
- 2.7 Confidence Intervals & Precision — Confidence Intervals: Quantifying Uncertainty 1. Point Estimates vs. Interval Estimates - Point Estimate: A single number calculated from a sample (e.g., sample mean x̄ = 45 ). It…
- 2.8 Non-Parametric: Chi-Square Test — Chi-Square (χ²): Analyzing Categorical Data 1. Parametric vs. Non-Parametric When data violates normal distribution assumptions, or when dealing with nominal/ordinal categorical…
- 2.9 Correlation & Linear Regression — Correlation & Regression: Modeling Relationships 1. Pearson Correlation Coefficient (r) Quantifies the linear relationship between two continuous variables. Study Deep: Adjusted…
- 2.10 Analysis of Variance (ANOVA) — ANOVA: Comparing Multiple Groups Rigorously 1. The Problem with Multiple T-Tests If testing 3 algorithms (A, B, C), running 3 T-tests results in Family-wise Error Rate Inflation.…
- 2.11 Statistical Paradoxes in Analytics — Statistical Paradoxes: When Math Defies Logic 1. Simpson’s Paradox A phenomenon where a trend appears in isolated subgroups of data, but disappears or reverses when the groups are…
- 3.1 Unit 3 Overview — Unit 3: Data Visualization with Python This unit focuses on the practical application of visualization libraries. You will learn to create static, animated, and interactive plots…
- 3.2 Matplotlib Basics — Data Visualization with Matplotlib: The Foundation 1. Introduction to Architecture Matplotlib is the foundational library for visualization in Python. Study Deep: Tufte's Data-Ink…
- 3.3 Advanced Matplotlib — Advanced Matplotlib: Styling, Subplots & Layouts 1. Subplots: One Figure, Multiple Graphs Often you want to compare different views side-by-side. We use plt.subplots() . Parameter…
- 3.4 Seaborn: Interface & Distributions — Study Deep: Kernel Density Estimation (KDE) A Histogram is sensitive to "bin size"—change the bin width, and the shape changes. The Solution: KDE . It smooths the data using a…
- 3.5 Seaborn: Categorical & Styling — Seaborn: Categorical Data & Aesthetics 1. Visualizing Categorical Data ( catplot ) When one variable is a category (e.g., "Day of Week") and the other is numerical (e.g., "Total…
- 4.1 Unit 4 Overview — Unit 4: GUI Programming & Database Access This final unit bridges the gap between analysis and application. You will learn to build user-friendly interfaces using Tkinter and…
- 4.2 GUI Programming with Tkinter — GUI Programming: Creating User Interfaces with Tkinter 1. Introduction to GUI (Graphical User Interface) A GUI allows users to interact with a program using visual elements like…
- 4.3 Advanced GUI Widgets — Advanced Tkinter: Selection, Menus, and Dialogs 1. Tkinter Variable Types Tkinter uses special variable classes to track widget state. They automatically update the GUI when…
- 4.4 Database Connectivity & SQL — Database Access in Python: The DB-API 1. Introduction to DB-API Python provides a standard interface called DB-API 2.0 (PEP 249) for interacting with databases. This means the…
- 4.5 CRUD Operations — Implementing CRUD in Python 1. Creating a Table Column Constraints: Constraint Meaning Example :--- :--- :--- PRIMARY KEY Unique identifier for each row id INTEGER PRIMARY KEY…
- MID Term Important Questions — MID Term Important Questions Section A – Short Answer Questions 1. Define Data Analytics. 2. What do you mean by Basic Nomenclature in Analytics? 3. Explain the Analytics Process…
- PYQ: End Term June 2024
- PYQ: End Term May/June 2025
1.1 Unit 1 Overview
Unit 1: Overview of Data Visualisation and Analytics
This unit introduces the fundamentals of data visualization, its importance, and various techniques to represent data effectively. We will explore how raw data is transformed into meaningful insights through a systematic process. By the end of this unit, you will understand the full analytics pipeline — from data collection to decision-making — and the mathematical tools used at each stage.
Topics Covered in This Unit
| # | Topic | Description | Key Concepts |
|---|---|---|---|
| 1.2 | Analytics Fundamentals | Core terminology, types of data, and the four analytics types | Structured vs. Unstructured Data, Descriptive to Prescriptive Analytics |
| 1.3 | Analytics Process Model | Step-by-step data analysis pipeline and professional roles | CRISP-DM, Data Engineer vs. Data Scientist |
| 1.4 | Analytical Models | Mathematical models for prediction and classification | Classification, Regression, Clustering, Time-Series |
| 1.5 | Data Collection & Sampling | How to gather representative data | Probability vs. Non-Probability Sampling, Central Limit Theorem |
| 1.6 | Data Quality & Outliers | Handling imperfect data | MCAR/MNAR Missingness, Z-Score & IQR Outlier Detection |
| 1.7 | Standardization | Scaling features for fair comparison | Min-Max Normalization, Z-Score Standardization, Robust Scaling |
| 1.8 | Categorization & Segmentation | Grouping data — rule-based vs. data-driven | K-Means Clustering, Silhouette Score |
Visual Overview
The visual overview below summarizes the key concepts covered in this unit, including the analytics process, types of data, and the role of visualization in decision-making.
(Refer to the image below for a structural breakdown)
1.2 Analytics Fundamentals
Analytics: Basic Nomenclature
1. What is Analytics?
Analytics is the systematic, computational process of collecting, cleaning, analyzing, and interpreting data to discover useful patterns, trends, and insights that help in decision-making. It transforms raw data into actionable intelligence using statistical methods, algorithms, and domain knowledge.
Study Deep: The DIKW Pyramid Logic
The DIKW Pyramid (Data, Information, Knowledge, Wisdom) represents the structural hierarchy of how we process raw facts into strategic decisions.
- Data: The raw, atomic facts (e.g., "102").
- Information: Data with context (e.g., "102 is the temperature in Fahrenheit").
- Knowledge: Information with experience (e.g., "102°F means the patient has a high fever").
- Wisdom: Knowledge with judgment (e.g., "Administer paracetamol and monitor the patient").
2. Data vs. Information vs. Knowledge
Understanding this hierarchy is fundamental:
| Concept | Definition | Example | Characteristics |
|---|---|---|---|
| Data | Raw, unprocessed facts and figures without context | 45, "Red", 12-07-2025 | Objective, unorganized, meaningless alone |
| Information | Data that has been processed, organized, and given context | "The red car was sold on 12-07-2025 for $45,000" | Contextual, organized, answers Who/What/When |
| Knowledge | Information combined with experience and judgment | "Red cars sell 20% faster in summer; stock more for Q2" | Actionable, experience-driven, answers How/Why |
| Wisdom | Applying knowledge ethically and strategically | "We should focus marketing on red cars in spring to maximize summer sales" | Strategic, forward-looking, answers "What's best?" |
This hierarchy is known as the DIKW Pyramid (Data → Information → Knowledge → Wisdom).
3. Types of Data
Data can be classified along multiple dimensions. The two foundational categories are:
| Feature | Structured Data | Unstructured Data | Semi-Structured Data |
|---|---|---|---|
| Format | Highly organized, fixed schema | No predefined format | Partially organized (tags/markers) |
| Storage | Relational Databases (SQL), Spreadsheets | Data Lakes, NoSQL, File Systems | JSON, XML, Email (header + body) |
| Examples | Student records, bank transactions, inventory | Emails, social media posts, videos, images | JSON API responses, HTML pages, log files |
| Ease of Analysis | Easy — direct queries with SQL | Difficult — requires NLP, Computer Vision | Moderate — requires parsing |
| % of All Data | ~20% | ~80% | Varies |
Data can also be classified by measurement scale:
- Nominal: Categories without order (e.g., Color: Red, Blue, Green).
- Ordinal: Categories with a meaningful order but unequal intervals (e.g., Rating: Low, Medium, High).
- Interval: Numeric with equal intervals but no true zero (e.g., Temperature in °C: 0°C ≠ "no heat").
- Ratio: Numeric with equal intervals AND a true zero (e.g., Weight: 0 kg = no weight).
4. The Four Types of Analytics
Analytics is categorized into four types, progressing in both complexity and business value:
| Type | Core Question | Techniques | Example | Value Level |
|---|---|---|---|---|
| Descriptive | What happened? | Averages, percentages, dashboards, charts | "Sales dropped by 10% last month" | Low (Hindsight) |
| Diagnostic | Why did it happen? | Drill-down, data discovery, correlations, root cause analysis | "Sales dropped because a competitor launched a cheaper product" | Medium (Insight) |
| Predictive | What is likely to happen? | Regression, forecasting, ML models, time-series analysis | "Sales are likely to drop another 5% next month" | High (Foresight) |
| Prescriptive | What should we do? | Optimization, simulation, decision trees, A/B testing | "Lower prices by 15% to regain market share" | Very High (Action) |
Analytics Maturity Model: Most organizations start at Descriptive and progressively adopt more advanced types. Only ~3% of enterprises fully leverage Prescriptive Analytics.
5. Key Terms Glossary
| Term | Definition | Example |
|---|---|---|
| Dataset | A collection of related data organized in rows and columns | A table of student marks |
| Variable (Feature) | A characteristic that can vary across observations | Age, Height, Income |
| Observation (Record) | A single row in a dataset representing one entity | One student's complete data |
| Insight | A valuable, actionable conclusion drawn from analysis | "Customers buy more on weekends" |
| KPI (Key Performance Indicator) | A measurable value that shows progress toward a goal | Monthly Revenue, Customer Churn Rate |
| Metric | A quantifiable measure used to track performance | Average Order Value, Click-Through Rate |
| Dimension | A categorical attribute used to slice data | Region, Product Category, Time Period |
Frequently asked questions
Is the Data Visualisation and Analytics course really free?
Yes. The entire Data Visualisation and Analytics course on Siksha Sarovar is free to read with no account required. You can optionally sign in with Google to save your progress.
Do I get a certificate for Data Visualisation and Analytics?
Yes — finish the lessons and pass the quiz to earn a free, verifiable certificate you can share on LinkedIn or with recruiters.
Can I run code while learning?
Yes. The built-in online compiler runs C, C++, Python, Java, PHP, JavaScript, C# and SQL directly in your browser — no installation needed.