Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

Top 30 Definitions & 50 Viva Questions

Lesson 32 of 36 in the free Big Data-1 notes on Siksha Sarovar, written by Rohit Jangra.

Top 30 Definitions (Must-Know for Short Questions)

#TermDefinition
1Big DataExtremely large datasets unmanageable by traditional tools; characterized by the 5 V's — Volume, Velocity, Variety, Veracity, Value
2HDFSHadoop Distributed File System — primary Hadoop storage; large files across machines with replication-based fault tolerance
3NameNodeHDFS master storing metadata only (file names, block locations, permissions); manages FsImage and EditLog
4DataNodeHDFS slaves that store actual data blocks; heartbeat every 3 seconds
5Secondary NameNodeNOT a backup — periodically merges FsImage + EditLog (checkpointing)
6MapReduceProgramming model for parallel processing: Map phase (key-value pairs) + Reduce phase (aggregation)
7YARNYet Another Resource Negotiator — Hadoop 2.x resource layer separating RM from per-app Application Masters
8HeartbeatDataNode→NameNode liveness signal every 3 seconds; 10 minutes of silence marks the node dead
9Replication FactorCopies of each block in HDFS; default 3
10Rack AwarenessPlacing replicas on different racks so data survives a whole-rack failure
11HiveHadoop data warehouse with SQL-like queries (HiveQL) compiled into MapReduce jobs
12HiveQLHive Query Language — SQL-like language over HDFS data
13PigHigh-level data-flow platform (Pig Latin) for ETL; compiles to MapReduce
14Pig LatinPig's scripting language: LOAD, FILTER, FOREACH, GROUP, JOIN, STORE
15HBaseNoSQL column-oriented database on HDFS with real-time read/write; Bigtable model
16SqoopBidirectional transfer between Hadoop and RDBMS — "SQL-to-Hadoop"
17FlumeDistributed service streaming log data from many sources into HDFS in real time
18MapperFirst MapReduce function — reads input, emits intermediate key-value pairs in parallel
19ReducerSecond function — aggregates grouped pairs after Shuffle & Sort into final output
20CombinerOptional mini-reducer on each mapper node; shrinks data before shuffling
21PartitionerRoutes keys to reducers via hash(key) % numReducers — same key, same reducer
22Shuffle and SortIntermediate phase grouping all values per key and sorting by key before reducing
23Serialization (Java)Converting an object to a byte stream (class implements java.io.Serializable); reverse is deserialization
24Generics (Java)Type parameters for classes/methods giving compile-time type safety, e.g. List<Integer>, Box<T>
25Wrapper ClassesObjects wrapping primitives (Integer, Double…) so they work in Collections; enable autoboxing/unboxing
26AutoboxingAutomatic primitive → wrapper conversion (int → Integer)
27Commodity HardwareInexpensive off-the-shelf machines Hadoop clusters are built from
28FsImageNameNode's snapshot file of the complete HDFS namespace at a point in time
29Block ReportFull block inventory a DataNode sends the NameNode every 6 hours
30Pseudo-Distributed ModeAll Hadoop daemons on one machine as separate processes — simulates a cluster for development

Top 50 Viva Questions

HDFS & Hadoop Basics

  1. What are the 5 V's of Big Data?
  2. What is HDFS and why is it used?
  3. What is the default block size in HDFS?
  4. What is the role of NameNode?
  5. Is Secondary NameNode a backup of NameNode?
  6. What is the default replication factor in HDFS?
  7. How often does a DataNode send a Heartbeat?
  8. What is Rack Awareness?
  9. How is fault tolerance achieved in HDFS?
  10. What are FsImage and EditLog?
  11. What is a Block Report?
  12. What is GFS and how does it relate to HDFS?
  13. What is commodity hardware in Hadoop?
  14. What is Pseudo-Distributed mode?
  15. What is data locality in Hadoop?

YARN & MapReduce

  1. What is YARN and when was it introduced?
  2. What are the components of YARN?
  3. What is a Container in YARN?
  4. What is the difference between Job Tracker and Resource Manager?
  5. What is the Map phase in MapReduce?
  6. What is the Reduce phase?
  7. What is Shuffle and Sort?
  8. What is the role of the Combiner?
  9. How does the Partitioner work?
  10. What are the two inputs to a Reducer?
  11. What is Hadoop Streaming?
  12. Can MapReduce process real-time data?
  13. What is Apache Spark and how does it differ from MapReduce?

Hive, Pig & Ecosystem

  1. What is Apache Hive?
  2. What is HiveQL?
  3. What is the Hive Metastore?
  4. Why is Hive preferred over writing MapReduce in Java?
  5. What is Apache Pig?
  6. What is Pig Latin?
  7. What is the difference between Hive and Pig?
  8. What is HBase?
  9. What is the difference between HBase and HDFS?
  10. What is Sqoop?
  11. What is Flume used for?
  12. Who created Hadoop and when?
  13. What paper inspired Hadoop?

Java for Big Data

  1. What is Serialization in Java?
  2. What interface must be implemented for serialization?
  3. What is the transient keyword?
  4. What is serialVersionUID?
  5. What are Generics in Java?
  6. What are Wrapper Classes?
  7. What is Autoboxing in Java?
  8. What does peek() do in a Stack?
  9. What is the difference between structured and unstructured data?