How to Set Up Hadoop in Pseudo Distributed Mode on Single Cluster

Published Apr 22, 2016Last updated Jan 18, 2017
How to Set Up Hadoop in Pseudo Distributed Mode on Single Cluster

Assuming you are running Linux/Mac OSX, the following steps will help you set up single node Hadoop cluster on your local machine.

Step1: Downloading hadoop.x.y.z.tar.gz

Download Hadoop from this link choosing a suitable mirror according to your location and clicking on the hadoop-1.2.1 folder and then further downloading the tarball by clicking on hadoop-1.2.1.tar.gz.
Hadoop-1.2.1

  1. Downloading a stable release copy ending with tar.gz
  2. Create a new folder /home/hadoop
  3. Move the file hadoop.x.y.z.tar.gz to the folder /home/hadoop
  4. Type or Copy/Paste this command in terminal: cd /home/hadoop
  5. Type or Copy/Paste this command in terminal: tar xzf hadoop*tar.gz

I will be writing Type or Copy/Paste command in terminal as Type/Copy/Paste

Step 2: Downloading and setting up Java

I assume you don't have Java installed and you are doing it from scratch.
If already installed you can check it by typing

java -version

in your terminal. Make sure your JAVA_HOME variable is already setup, if not then follow the following steps:

Type/Copy/Paste:

sudo apt-get purge openjdk-\*

Type/Copy/Paste:

sudo mkdir -p /usr/local/java

Download Java JDK and JRE from the link below, look for Linux, 64-bit and a tar.gz ending file:
http://www.oracle.com/technetwork/java/javase/downloads/index.html

After you've finished downloading the file, go to the folder where you saved it and then copy to the folder we created for java:

Type/Copy/Paste:

sudo cp -r jdk-*.tar.gz /usr/local/java

Type/Copy/Paste:

sudo cp -r jre-*.tar.gz /usr/local/java

Extract and install Java:

Type/Copy/Paste:

cd /usr/local/java

Type/Copy/Paste:

sudo tar xvzf jdk*.tar.gz

Type/Copy/Paste:

sudo tar xvzf jre*.tar.gz

Now put all the variables in the profile.

Type/Copy/Paste:

sudo gedit /etc/profile

At the end, copy & paste the following code:
(Note: change the version number and path to the folder according to where you've installed Java. The version number probably changed since I wrote this guide, so just make sure that the path you mention actually exists)

JAVA_HOME=/usr/local/java/jdk1.7.0_40
PATH=$PATH:$JAVA_HOME/bin
JRE_HOME=/usr/local/java/jre1.7.0_40
PATH=$PATH:$JRE_HOME/bin
HADOOP_INSTALL=/home/hadoop/Hadoop/hadoop-1.2.1
PATH=$PATH:$HADOOP_INSTALL/bin
export JAVA_HOME
export JRE_HOME
export PATH

Do the following so that Linux knows where Java is:
(Again, note that the highlighted following paths may be needed to changed in accordance to your installation)

sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jre1.7.0_40/bin/java" 1

sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 1
 
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_40/bin/javaws" 1
 
sudo update-alternatives --set java /usr/local/java/jre1.7.0_40/bin/java
 
sudo update-alternatives --set javac /usr/local/java/jdk1.7.0_40/bin/javac
 
sudo update-alternatives --set javaws /usr/local/java/jre1.7.0_40/bin/javaws

Refresh the profile with

. /etc/profile

Test it by typing

Java –version

and you will get something like this

java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

Pseudo Distributed Mode

Type/Copy/Paste

sudo apt-get install ssh

Then

sudo apt-get install rsync

Navigate to /home/hadoop/hadoop-1.2.1 and then do the follow the steps:

Change conf/core-site.xml to

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Change conf/hdfs-site.xml to

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Change conf/mapred-site.xml to

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

Edit conf/hadoop-env.sh, look for JAVA_HOME, and set it up

export JAVA_HOME=/usr/local/java/jdk1.7.0_40

Note: replace this 1.7.0_40 with the version you have installed

Setup passwordless ssh wtih the following steps:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

To confirm that passwordless ssh has been set up, type the following and you should not be prompted for a password.

ssh localhost

Now, navigate to the folder where you extracted the Hadoop tarball.

Mine is /home/hadoop/hadoop-1.2.1/.

Format the name node:

bin/hadoop namenode –format

Start the all the demons:

bin/start–all.sh

Now type jps in your terminal window to check that all the process are up and running or not. Jps shows the Java programs running in the background.

For visualizing the daemons processes and every other stats, follow these steps:

Type this in browser window to get the UI for Name node http://localhost:50070/ and Jobtracker http://localhost:50030/

Stop all the demons via your terminal:

bin/stop–all.sh

Congratulations! You have successfully set up a Single Node Pseudo-Distributed cluster on your local machine.

References - https://www.udemy.com/hadoop-tutorial/
I followed the steps that were given in these video lectures and I have explained the same to you with all the difficulties resolved that I faced while setting up.

Discover and read more posts from Lakshay Nagpal
get started
Enjoy this post?

Leave a like and comment for Lakshay

1
1