How to Set Up Hadoop in Pseudo Distributed Mode on Single Cluster
Assuming you are running Linux/Mac OSX, the following steps will help you set up single node Hadoop cluster on your local machine.
Step1: Downloading hadoop.x.y.z.tar.gz
Download Hadoop from this link choosing a suitable mirror according to your location and clicking on the hadoop-1.2.1 folder and then further downloading the tarball by clicking on hadoop-1.2.1.tar.gz.
- Downloading a stable release copy ending with tar.gz
- Create a new folder /home/hadoop
- Move the file hadoop.x.y.z.tar.gz to the folder /home/hadoop
- Type or Copy/Paste this command in terminal: cd /home/hadoop
- Type or Copy/Paste this command in terminal: tar xzf hadoop*tar.gz
I will be writing Type or Copy/Paste command in terminal as Type/Copy/Paste
Step 2: Downloading and setting up Java
I assume you don't have Java installed and you are doing it from scratch.
If already installed you can check it by typing
in your terminal. Make sure your
JAVA_HOME variable is already setup, if not then follow the following steps:
sudo apt-get purge openjdk-\*
sudo mkdir -p /usr/local/java
Download Java JDK and JRE from the link below, look for Linux, 64-bit and a tar.gz ending file:
After you've finished downloading the file, go to the folder where you saved it and then copy to the folder we created for java:
sudo cp -r jdk-*.tar.gz /usr/local/java
sudo cp -r jre-*.tar.gz /usr/local/java
Extract and install Java:
sudo tar xvzf jdk*.tar.gz
sudo tar xvzf jre*.tar.gz
Now put all the variables in the profile.
sudo gedit /etc/profile
At the end, copy & paste the following code:
(Note: change the version number and path to the folder according to where you've installed Java. The version number probably changed since I wrote this guide, so just make sure that the path you mention actually exists)
JAVA_HOME=/usr/local/java/jdk1.7.0_40 PATH=$PATH:$JAVA_HOME/bin JRE_HOME=/usr/local/java/jre1.7.0_40 PATH=$PATH:$JRE_HOME/bin HADOOP_INSTALL=/home/hadoop/Hadoop/hadoop-1.2.1 PATH=$PATH:$HADOOP_INSTALL/bin export JAVA_HOME export JRE_HOME export PATH
Do the following so that Linux knows where Java is:
(Again, note that the highlighted following paths may be needed to changed in accordance to your installation)
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/local/java/jre1.7.0_40/bin/java" 1 sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_40/bin/javac" 1 sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/local/java/jre1.7.0_40/bin/javaws" 1 sudo update-alternatives --set java /usr/local/java/jre1.7.0_40/bin/java sudo update-alternatives --set javac /usr/local/java/jdk1.7.0_40/bin/javac sudo update-alternatives --set javaws /usr/local/java/jre1.7.0_40/bin/javaws
Refresh the profile with
Test it by typing
and you will get something like this
java version "1.8.0_40" Java(TM) SE Runtime Environment (build 1.8.0_40-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
Pseudo Distributed Mode
sudo apt-get install ssh
sudo apt-get install rsync
/home/hadoop/hadoop-1.2.1 and then do the follow the steps:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
conf/hadoop-env.sh, look for
JAVA_HOME, and set it up
Note: replace this 1.7.0_40 with the version you have installed
Setup passwordless ssh wtih the following steps:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
To confirm that
passwordless ssh has been set up, type the following and you should not be prompted for a password.
Now, navigate to the folder where you extracted the Hadoop tarball.
Format the name node:
bin/hadoop namenode –format
Start the all the demons:
jps in your terminal window to check that all the process are up and running or not.
Jps shows the Java programs running in the background.
For visualizing the daemons processes and every other stats, follow these steps:
Type this in browser window to get the UI for Name node
http://localhost:50070/ and Jobtracker
Stop all the demons via your terminal:
Congratulations! You have successfully set up a Single Node Pseudo-Distributed cluster on your local machine.
References - https://www.udemy.com/hadoop-tutorial/
I followed the steps that were given in these video lectures and I have explained the same to you with all the difficulties resolved that I faced while setting up.