Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

4.2 Anatomy of a MapReduce Job Run

Lesson 25 of 36 in the free Big Data-1 notes on Siksha Sarovar, written by Rohit Jangra.

4.2.1 The Classic MapReduce (MR1) Architecture

In the early versions of Hadoop (0.x, 1.x), the job run was managed by two main daemons:

  • JobTracker (Master): Coordinates the entire job run. It schedules tasks on TaskTrackers and keeps track of which tasks failed.
  • TaskTracker (Slave): Runs the actual map and reduce tasks. It periodically sends "heartbeats" to the JobTracker to report progress.

Limitations of MR1:

  • Scalability: The JobTracker became a bottleneck at around 4,000 nodes.
  • Single Point of Failure: If the JobTracker crashed, every running job failed.
  • Rigidity: It could only run MapReduce. You couldn't run other processing models (like Spark) on the same cluster easily.

4.2.2 The Modern Approach: YARN (MR2)

YARN (Yet Another Resource Negotiator) decoupled resource management from the programming model.

The Three Main Components of YARN:

  1. ResourceManager (RM): The ultimate authority that arbitrates resources among all applications in the system.
  2. NodeManager (NM): The per-machine agent responsible for launching and monitoring "Containers" (bundles of CPU and RAM).
  3. ApplicationMaster (AM): A library-specific entity that negotiates resources from the RM and works with the NM to execute and monitor tasks.

4.2.3 The Journey of a Job in YARN

  1. Submission: Client submits the job to the ResourceManager.
  2. AM Startup: RM allocates a container and starts the ApplicationMaster.
  3. Resource Request: The AM asks the RM for containers for the mappers.
  4. Task Launch: AM contacts the NodeManagers to start the tasks.
  5. Progress: NM reports task progress to the AM, while AM reports job progress to the Client.
  6. Cleanup: Once done, the AM unregisters with the RM and releases the containers.