Hadoop Installation

Table of Contents

SINGLE-NODE [STANDALONE] CLUSTER INSTALLATION
Prerequisites
Installation
- 1 Install Hadoop
  - Step 1 : Download the hadoop installation file [bin file] from official Apache website and extract and do it with the following commands.
- Setup Environment Variables for Hadoop

SINGLE-NODE [STANDALONE] CLUSTER INSTALLATION

The report here will describe the required steps for setting up a single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux

Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System (GFS) and of the MapReduce computing paradigm.

Hadoop’s HDFS is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets.

Prerequisites

System Requirements

Operating System	Ubuntu 64bit (Recommended)
RAM	2GB +
HDD	40GB +

Install Java

Hadoop requires minimum 1.6+ (aka Java 6) installation. Check if java is already installed or install it using sudo apt-get command in the terminal. We have used Oracle Java-8.

Step 1 : Check if Java is Installed

java -version

If not Installed, Install openjdk or Oracle Java 7 or 8,in our demo we have installed oracle -8.

Step 2 : Clone ‘ppa:webupd8team/java’ repository, Update Source List and install Java 8

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update 
sudo apt-get install oracle-java8-installer

Check if Java is correctly installed using step 1 command.

Configuring SSH:

The hadoop control scripts rely on SSH to peform cluster-wide operations. For example, there is a script for stopping and starting all the daemons in the clusters. To work seamlessly, SSH needs to be setup to allow password-less login for the hadoop user from machines in the cluster. The simplest way to achive this is to generate a public/private key pair, and it will be shared across the cluster. Hadoop requires SSH access to manage its nodes

Generate an SSH key

ssh-keygen -t rsa -P ""

Note:

P “”, here indicates an empty password.
If asked for a filename just leave it blank and press the enter key to continue.

Enable SSH access to your local machine with this newly created key which is done by the following command.

 cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 4 : The final step is to test the SSH setup by connecting to the local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known hosts file.

ssh localhost

Installation

1 Install Hadoop

Step 1 : Download the hadoop installation file [bin file] from official Apache website and extract and do it with the following commands.

sudo wget https://www-eu.apache.org/dist/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz

Step 2 : To extract this tar file use following command

tar xvzf hadoop-2.9.2.tar.gz

Step 3 : Move the extracted Hadoop installation Folder to the /usr/local/ directory using the following command

mv hadoop-2.9.2 /usr/local/

Step 4 : Change the permission of hadoop-2.7.2 to dedicated user acccount.

sudo chown -R <user>:<user_group> /usr/local/hadoop-2.9.2/

Setup Environment Variables for Hadoop

Step 1 : Open ~/.bashrc File to add Environmental variables using following command

vi ~/.bashrc

Step 2 : Add the Following lines at the end of ~/.bashrc file.

export JAVA_HOME	/usr/lib/jvm/java-8-oracle	It is the path of Java – JDK installed in library folder
export HADOOP_INSTALL	/usr/local/hadoop-2.9.2	Path of Hadoop folder
export PATH	$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin	To run cmd directly from anywhere

Step 3 : Execute ~/.bashrc file.

source ~/.bashrc

Step 4 : Check if Hadoop installed correctly

hadoop version

Step 5 : Append following lines to the hadoop-2.9.2/etc/hadoop/hdfs-site.xml

The hdfs-site.xml file needs to be configured

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop-2.9.2/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop-2.9.2/datanode</value> </property> </configuration>

Step 4 : Set JAVA_HOME in hadoop-2.9.2/etc/hadoop/hadoop-env.sh file

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Append the following lines to the hadoop-2.9.2/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>

Format Hadoop File system
$hadoop namenode -format

start the daemons
$start-all.sh

To check all the running services nodes

jps

Web UI of the NameNode daemon, after starting single-node cluster

http://localhost:50070/

Web UI of the Job HistoryServer daemon, after starting single-node cluster

http://localhost:19888/