Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

What is Data Science?

Lesson 2 of 37 in the free Data Science notes on Siksha Sarovar, written by Rohit Jangra.

What is Data Science?

Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract meaningful knowledge and insights from noisy, structured, and unstructured data. It sits at the intersection of three core domains:

  • Statistics & Mathematics: For modeling, probability, and quantitative analysis.
  • Computer Science & Programming: For building algorithms, automating tasks, and handling large datasets.
  • Domain Knowledge: For understanding the real-world context in which data exists.

Formal Definition

Data Science is the systematic study of data through the application of quantitative and analytical approaches to derive actionable insights. It encompasses a broad range of techniques from descriptive statistics to advanced machine learning. Unlike traditional business analytics, Data Science seeks not only to understand what happened in the past, but also to predict what will happen in the future and prescribe optimal actions.

Why Data Science Matters

The explosion of digital data in the 21st century has created an unprecedented need for skilled professionals who can turn raw data into strategic value. According to industry research:

  • Every day, approximately 2.5 quintillion bytes of data are created.
  • Over 90% of the world's data has been generated in just the last two years.
  • Companies that leverage data-driven decision-making are, on average, 5% more productive and 6% more profitable than their competitors.

Data Science provides the tools and methodologies to harness this data deluge and convert it into a competitive advantage.

Core Pillars of Data Science

PillarDescriptionExample
StatisticsFoundation for understanding data distributions, sampling, and hypothesis testingA/B Testing on a website
Machine LearningAlgorithms that learn from data to make predictions or decisionsSpam email filter
Data EngineeringInfrastructure to collect, store, and process large datasetsBuilding a data pipeline
VisualizationPresenting insights in a clear, actionable formatInteractive dashboards
Domain ExpertiseContextual understanding of the business or fieldMedical diagnosis rules

Data Science vs Related Fields

It is important to distinguish Data Science from closely related fields:

FeatureData ScienceArtificial IntelligenceMachine LearningStatistics
GoalExtract insights from dataSimulate human intelligenceLearn patterns from dataAnalyze and interpret data
ScopeBroad — encompasses ML, Stats, EngineeringBroad — includes ML, NLP, RoboticsSubset of AIFoundation of Data Science
OutputInsights, Predictions, ReportsIntelligent SystemsPredictive ModelsEstimates, Hypothesis Tests
ExampleCustomer churn analysisSelf-driving carEmail spam classifierClinical trial analysis

Key Terminology

  • Dataset: A structured collection of data, often represented as a table with rows (records) and columns (features).
  • Feature (Variable): An individual measurable property of the data (e.g., Age, Income, Temperature).
  • Label (Target): The outcome variable that a model tries to predict (e.g., "Yes/No" for fraud).
  • Model: A mathematical representation of a real-world process, trained on data to make predictions.
  • Algorithm: A step-by-step procedure for solving a problem or performing a computation.

The Data Science Venn Diagram

Data Science is famously represented as the intersection of three circles:

  1. Hacking Skills (Computer Science): The ability to write code, manipulate data, and use tools.
  2. Math & Statistics Knowledge: Understanding the theory behind the models and analysis.
  3. Substantive Expertise (Domain Knowledge): Knowing which questions to ask and how to interpret results in context.

The "sweet spot" where all three overlap is where true Data Science happens. Without domain knowledge, you may build accurate but meaningless models. Without statistics, your conclusions may be flawed. Without programming, you cannot implement your ideas at scale.

Summary

  • Data Science is an interdisciplinary field combining math, computing, and domain knowledge.
  • It aims to extract actionable insights from data.
  • It is distinct from, but related to, AI, ML, and traditional statistics.
  • The field is driven by the massive growth of data in the modern world.