Siksha Sarovar

Siksha Sarovar (sikshasarovar.com) is a free educational web application that helps students in India learn programming and prepare for academic and competitive exams. The platform offers structured coding courses (C, C++, Python, Java, HTML, CSS, PHP, Power BI, AI, Machine Learning, Data Science), complete university curriculum notes for BCA/MCA students with previous year question papers, Class 10 and Class 12 CBSE/HBSE school notes, and dedicated preparation material for SSC, UPSC, Banking, Railway and other government exams. Browsing the site is completely free and requires no account. Users may optionally sign in with Google solely to save their learning progress, quiz scores and personal preferences across devices.

Privacy Policy | Terms of Service | Contact Siksha Sarovar | About Siksha Sarovar

v4.0.9 · PWA
Siksha Sarovar logo
Siksha Sarovar
Your Learning Universe

Siksha Sarovar is a free e-learning platform for coding courses, BCA university notes and competitive exam preparation. Optional Google sign-in saves your learning progress across devices.

Initializing knowledge base…
Compiling modules 0%

3.2 Hadoop Streaming and Pipes

Lesson 18 of 36 in the free Big Data-1 notes on Siksha Sarovar, written by Rohit Jangra.

3.2.1 Hadoop Streaming

While Hadoop is written in Java, Hadoop Streaming allows you to write MapReduce programs in any language that can read from standard input (stdin) and write to standard output (stdout).

  • How it works: Hadoop launches your script as a separate process. It pipes data into your script and captures the results.
  • Popular Languages: Python, Ruby, Perl, even Bash.
  • Use Case: Rapid prototyping or using existing libraries (like NumPy in Python) that aren't available in Java.

3.2.2 Hadoop Pipes

Hadoop Pipes is a C++ interface for Hadoop MapReduce.

  • Contrast with Streaming: Unlike Streaming, which uses text-based pipes, Pipes uses Sockets to communicate between the Java TaskTracker and the C++ application code.
  • Benefit: Faster performance for computationally intensive tasks compared to Streaming.
  • Complexity: Requires more setup as you need to compile your C++ code against the Hadoop Pipes library.

Comparison: Streaming vs. Pipes

FeatureStreamingPipes
Primary LanguagePython, Ruby, ShellC++
CommunicationStdin / StdoutTCP/IP Sockets
PerformanceGood (but text-based overhead)Excellent (Binary data flow)
Ease of UseVery EasyModerate