How to Install Hadoop 2.7 in Ubuntu
Hadoop is a Cluster data Management Project that is sponsored by Apache Software Foundation. It is a Java-based framework which allows managing the huge data sets among the group of cluster machines.
In the beginning, it seems tough to configure the cluster with Hadoop but you can even install Hadoop on a single macine to perform your basic operations.
Although Hadopp seems to be a single software, it has a lot of components begind it. Here, are some of them.
Hadoop Common - It is a big library consisting of utilities and libraries for supporting other Hadoop modules.
HDFS(Hadoop Distributed File System) - It is responsible for storing the data on the hard disk.
YARN - It is an open source distributed processing framework which stands for Yet Another Resource Negotiator.
MapReduce - It is a model for generating and processing big data sets in the cluster for using parallel and distributed algorithms.
In this article, we will learn to install and configure Hadoop 2.7x on Ubuntu OS. Follow the given steps to install Hadoop 2.7
Prerequisites
If you have Windows/Mac OS then try to install Hadoop 2.7 by creating a virtual machine and then install Ubuntu using VMWare player or create a virtual machine and install Ubuntu using Oracle Virtual Box.
Step I: Install Oracle Java version 8
1.Install the properties of Python Software
2.Insert a Repository
3.Update the source list
4.Install Oracle Java 8
Step II: Set an SSH without Passcode
1.Install Open SSH Client and Open SSH Server
2.Generate Private and Public Key Pairs
3.Configure your password-less SSH
4.Check to localhost by SSH
Step III: Configuration, Setup, and Installation of Hadoop
1.First, download Hadoop
https://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.g
2.Untar Tarball
All the mandatory jars, scripts and configuration files are available in the HADOOP_HOME directory.
3.Configuration Setup
Edit .bashrc file
Add the following parameters in the .bashrc file in the user's home directory. All the environment variables will come into effect after proceeding the above step which will restart the terminal.
Edit Hadoop-env.sh
Edit the Hadoop-env.sh file located in etc/Hadoop inside Hadoop installation directory and then set JAVA_HOME:
Edit XML file (core-site.xml)
Edit the core-site.xml file located in etc/Hadoop inside Hadoop installation directory and then add the given entries:
Edit XML file (hdfs-site.xml)
Edit the core-site.xml file located in etc/Hadoop inside Hadoop installation directory and then add the given entries:
Edit XML file (mapred-site.xml)
Edit the mapred-site.xml file located in etc/Hadoop inside Hadoop installation directory and then add the given entries:
Edit XML file (yarn-site.xml)
Edit the yarn-site.xml file located in etc/Hadoop inside Hadoop installation directory and then add the given entries:
Step IV: Begin with the Cluster
1.Formatting the name node
It should be formatted only once when you install Hadoop.
2.Strat HDFS services.
3.Start YARN services.
4.Look up whether the services have been started.
Step V: Run Map-Reduce Jobs
Step VI: Stopping the Cluster
1.Stop HDFS services.
2.Stop YARN services.
Summing Up
Thus, we come to an end. This was all on the tutorial to install Hadoop 2.7 on Ubuntu in just 15 minutes. We would like to know your feedback on installing Hadoop on Ubuntu tutorial. Keep Learning!
Author Bio:
HP Morgan works as a Tech analyst at TatvaSoft.com.au, a customer software and Web development company in Australia. He loves to travel to natural places.