How MapReduce Works

  1. Write a MapReduce Java program and bundle it in a JAR file. You can have a look in my previous post how to create a MapReduce program in Java using Eclipse  and bundle a JAR file “First Example Project using Eclipse“.
  2. Client submit the job to the JobTracker by running the JAR file ($ hadoop jar ….). Actually the driver program (WordCountDriver.java) act as a client which will submit the job by calling “JobClient.runJob(conf);“. The program can run on any node (as a separate JVM) in the Hadoop cluster or outside cluster. In our example, we are running the client program on the same machine where JobTracker is running usually NameNode. The job submission steps includes:

    Read More…

Apache Hadoop Installation and Cluster Setup: Part-3

Let’s start with HadoopNameNode (master), repeat this for SNN and 2 slaves. Connect to HadoopNameNode through PuTTY and follow the commands.

Update the packages and dependencies

$ sudo apt-get update

Install Oracle Java

Install the latest Oracle Java (JDK) 7 in Ubuntu

Read More…

Setting up Hadoop Cluster on Amazon Cloud

I wanted to get familiar with the big data world, and decided to test Hadoop on Amazon Cloud. It was a really interesting and informative experience. The aim of this blog is to share my experience, thoughts and observations related to both practical and non-practical use of Apache Hadoop.

Overview 

A typical Hadoop multi-node cluster

Read More…