Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

4.3 Failures, Scheduling & Task Execution

Lesson 26 of 36 in the free Big Data-1 notes on Siksha Sarovar, written by Rohit Jangra.

4.3.1 Handling Failures: The Resilience of Hadoop

Hadoop is built on the principle that "Failure is the norm."

1. Task Failure

  • If a Mapper or Reducer crashes, the NM reports the failure to the AM.
  • The AM will Reschedule the task on a different node.
  • It will try up to 4 times (default) before failing the entire job.

2. NodeManager Failure

  • If a NodeManager stops sending heartbeats to the RM, the RM marks the node as "down."
  • All tasks that were running on that node are marked as failed and rescheduled.

3. ResourceManager Failure

  • In modern Hadoop, we use High Availability (HA) with two ResourceManagers (Active and Standby). If Active fails, Standby takes over instantly using Zookeeper.

4.3.2 Job Scheduling in Hadoop

Since Big Data clusters are expensive, they are usually shared. We need a way to decide "Who goes first?"

FIFO SchedulerFirst In, First Out.Private clusters or simple testing.
Capacity SchedulerPartitions the cluster into queues (e.g., 50% for Marketing, 50% for Finance).Corporate environments with guaranteed resources.
Fair SchedulerDynamically shares resources. If only one job is running, it gets 100%. If a second job starts, they get 50% each.Research environments with many small tasks.

4.3.3 Speculative Execution: The Straggler Solution

In a large cluster, some nodes might be "stragglers"—nodes that are much slower than others due to failing hardware or network congestion.

  • Hadoop's Solution: Launch a duplicate "speculative" task on a different node.
  • The Race: Whoever finishes first wins and the other task is killed.
  • Benefit: This prevents one bad node from slowing down a job that involves 1,000 other healthy nodes.
  • JVM Reuse: Instead of starting a new Java Virtual Machine for every tiny task, Hadoop can reuse the same JVM, saving startup overhead.