How MapReduce Works

June 29, 2015June 30, 2015 Bikash Sen Hadoop combiner, copy phase, hadoop, input split, job tracker, map, mapper, mapreduce, merge, partitioning, reduce, reducer, shuffle, sort, spilling, task tracker

Write a MapReduce Java program and bundle it in a JAR file. You can have a look in my previous post how to create a MapReduce program in Java using Eclipse and bundle a JAR file “First Example Project using Eclipse“.
Client submit the job to the JobTracker by running the JAR file ($ hadoop jar ….). Actually the driver program (WordCountDriver.java) act as a client which will submit the job by calling “JobClient.runJob(conf);“. The program can run on any node (as a separate JVM) in the Hadoop cluster or outside cluster. In our example, we are running the client program on the same machine where JobTracker is running usually NameNode. The job submission steps includes:
Read More…

The Hadoop Ecosystem Table

February 3, 2015April 5, 2015 Bikash Sen Hadoop ecosystem table, hadoop ecosystem, hadoop latest release, hadoop written in

List of major projects/tools surrounding hadoop with their category which build up Enterprise Data Platform. It is growing at a rapid pace to keeping in mind three Vs of Big Data: Volume (Big), Velocity (Fast) and Variety (Smart). Find the table below:

Hadoop: First Example Project using Eclipse: Part-4

January 3, 2015February 7, 2015 Bikash Sen Hadoop basic hadoop example, hadoop eclipse, hadoop wordcount, mapreduce with eclipse, Word count

Download and Install Eclipse here.

Create New Java Project

Apache Hadoop Installation and Cluster Setup: Part-3

January 3, 2015May 16, 2015 Bikash Sen Hadoop apache hadoop, apache hadoop installation setup, configuration files, configure hadoop, configure hadoop on aws, hadoop setup, install hadoop, install hadoop on linux, Java, setup hadoop on amazon, slave nodes

Let’s start with HadoopNameNode (master), repeat this for SNN and 2 slaves. Connect to HadoopNameNode through PuTTY and follow the commands.

Update the packages and dependencies

$ sudo apt-get update

Install Oracle Java

Install the latest Oracle Java (JDK) 7 in Ubuntu

Setting up Client Access to Amazon EC2 Instances: Part-2

January 3, 2015May 8, 2015 Bikash Sen Hadoop Amazon account, Amazon EC2, configuration files, configure hadoop, ec2 access putty filezilla, hadoopnamenode, install hadoop, PuTTY, puttygen ppk, setup hadoop on amazon, slave nodes, windows client ec2

To prepare to connect to a Linux instance from Windows using PuTTY

1. Download and install PuTTY and PuTTYgen from here.

2. Start PuTTYgen (from the Start menu, click All Programs > PuTTY > PuTTYgen)

Setting up infrastructure with Amazon EC2: Part-1

January 3, 2015February 7, 2015 Bikash Sen Hadoop Amazon, Amazon account, Amazon EC2, AWS

If you’ve already signed up for Amazon Web Services (AWS), you can start using Amazon EC2 immediately. You can open the Amazon EC2 console, click Launch Instance, and follow the steps in the launch wizard to launch your first instance.

Get Amazon **FREE** AWS Account

If you do not already have an account, please create a new free one. Amazon EC2 comes with eligible free-tier instances. Please find the below free-tier usages for your reference. For more information, see AWS Free Tier

Setting up Hadoop Cluster on Amazon Cloud

January 3, 2015February 7, 2015 Bikash Sen Hadoop hadoop, hadoop amazon, hadoop cluster amazon, hadoop ec2

I wanted to get familiar with the big data world, and decided to test Hadoop on Amazon Cloud. It was a really interesting and informative experience. The aim of this blog is to share my experience, thoughts and observations related to both practical and non-practical use of Apache Hadoop.

Overview

A typical Hadoop multi-node cluster

Hadoop ABCD

Let's Do Big Data…

Hadoop

How MapReduce Works

The Hadoop Ecosystem Table

Hadoop: First Example Project using Eclipse: Part-4

Apache Hadoop Installation and Cluster Setup: Part-3

Setting up Client Access to Amazon EC2 Instances: Part-2

Setting up infrastructure with Amazon EC2: Part-1

Setting up Hadoop Cluster on Amazon Cloud