Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

End Term Important Questions

Lesson 30 of 36 in the free Big Data-1 notes on Siksha Sarovar, written by Rohit Jangra.

End Term Important Questions — PYQ Analysis

Based on an analysis of the last three end-term papers (Dec 2021, Dec 2024, Dec 2025). Questions marked ★ Must Do have appeared in all three papers — treat them as sure-shot questions and prepare them first.

Complete Question Bank (Priority-Wise)

#QuestionUnitMarksTimes AppearedPriority
1Define the characteristics of Big Data (Four/Five V's)Unit 11.5 / 5 / 103★ Must Do
2What is the role of NameNode and DataNode in HDFS?Unit 21.5 / 103★ Must Do
3Architecture of HDFS / Draw and explain HDFS ArchitectureUnit 25 / 10 / 153★ Must Do
4MapReduce paradigm / Word Count program / MapReduce ArchitectureUnit 35 / 10 / 153★ Must Do
5Architecture of YARN / YARN components and job schedulingUnit 35 / 103★ Must Do
6Building blocks of HadoopUnit 2102★ Must Do
7Fault tolerance in HDFS / How is high availability achieved in HDFSUnit 25 / 103★ Must Do
8Differentiate between Hive and Pig / Why Hive is preferred over MapReduceUnit 45 / 1.53★ Must Do
9Shuffle and Sort mechanism in MapReduceUnit 352Important
10Role of Job Tracker / Task Tracker in HadoopUnit 31.5 / 53Important
11How does the partitioner decide which reducer receives a key?Unit 31.52Important
12Apache Pig components and role of Pig in the Hadoop ecosystemUnit 452Important
13Wrapper classes in Java / Concept of Wrapper ClassesUnit 55 / 102Important
14Serialization and Deserialization in Java / Serialize and persist to fileUnit 55 / 10 / 152Important
15Generics in Java / Difference between generics and wrapper classesUnit 55 / 102Important
16Pseudo-distributed mode configuration of Hadoop clusterUnit 21.52Moderate
17History of Big Data / Major events in the Big Data era in the 2000sUnit 11.5 / 102Moderate
18Technology challenges for Big Data / Challenges of unstructured Big DataUnit 11.52Moderate
19Heartbeat signal in HDFSUnit 21.52Moderate
20Default block size in HDFSUnit 21.52Moderate
21Singly linked list to implement Stack and Queue in JavaUnit 5101Moderate
22Big Data transforming healthcare and finance sectorsUnit 151Moderate
23Differentiate between GFS and HDFSUnit 251Moderate
24Generic method syntax in Java / Generic types in JavaUnit 51.52Moderate
25What is serialization / peek() function in JavaUnit 51.52Moderate
26What is Hadoop streaming?Unit 31.51Normal
27Commodity hardware in HadoopUnit 21.51Normal
28MapReduce programming model with real-world exampleUnit 3101Normal
29Linked list data structure – working and concept of wrapper classesUnit 510 / 51Normal
30Sort and Shuffle mechanism in MapReduce (short note)Unit 352Important

Exam Predictions Based on PYQ Trends

🔴 Very High Probability (90–95% confidence)

  • HDFS Architecture with diagram — appeared every year, always a long question (10–15 marks)
  • MapReduce + Word Count program — the most comprehensive question (15 marks)
  • 5 V's of Big Data — Part-A staple, always 1.5 marks
  • NameNode and DataNode roles — appears in every paper in Part-A or Part-B
  • YARN Architecture — repeated across all three papers

🟡 High Probability (70–85% confidence)

  • Hive vs Pig differentiation — appeared 3x, likely as a 5-mark comparison
  • Shuffle and Sort mechanism — frequently asked 5-mark question
  • Building Blocks of Hadoop — appeared twice, 10-mark question
  • Serialization in Java with code — appeared in two papers
  • Role of Job Tracker / Task Tracker — 1.5 or 5 marks

🟢 Moderate Probability (50–65% confidence)

  • Apache Pig architecture and real-world example
  • Generics vs Wrapper Classes
  • History of Big Data / milestones
  • Linked List, Stack, Queue in Java using Generics
  • GFS vs HDFS differentiation
  • Challenges of unstructured Big Data
  • Short note on YARN / Hive and Pig

5-Hour Revision Plan

HourTopics
Hour 1Big Data V's + HDFS Architecture + NameNode/DataNode
Hour 2MapReduce + Word Count + Shuffle & Sort
Hour 3YARN + Building Blocks of Hadoop + Fault Tolerance
Hour 4Hive + Pig + HiveQL vs Pig Latin comparison
Hour 5Java — Serialization + Generics + Wrapper Classes + diagram review

⚡ Key Numbers to Remember

FactValue
HDFS Block Size128 MB (Hadoop 2.x), 64 MB (Hadoop 1.x)
Replication Factor3 (default, configurable via dfs.replication)
Heartbeat IntervalEvery 3 seconds; node marked dead after 10 min silence
Block ReportEvery 6 hours (complete block inventory)
Hadoop Released2005 (Doug Cutting), open-source 2006
YARN IntroducedHadoop 2.x (2013), replaced Job Tracker
Spark SpeedUp to 100x faster than MapReduce (in-memory)
Checksum TypeCRC-32C (per-block verification)
Java Serializationimplements Serializable (marker interface)