How to setup Hadoop for single node cluster in CentOS

To setup Hadoop for single node cluster in CentOS

Hadoop is an open source software to process big data. If the data size is beyond the limit of hardware capability, it stores and process the data in a distributed manner across all the datanodes in the cluster. The datas are stored in the Hadoop Distributed File System (HDFS).

To install Java

Hadoop requires java. Go to /opt/ directory and download the java package and extract it. Visit the following link to download java.

http://download.oracle.com/otn-pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz

[root@linuxhelp ~]# cd /opt/
[root@linuxhelp ~]# wget --no-cookies --no-check-certificate --header " Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F oraclelicense=accept-securebackup-cookie"  http://download.oracle.com/otn-pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz
[root@linuxhelp opt]# tar -xzf jdk-8u91-linux-x64.tar.gz 
[root@linuxhelp opt]# ls
jdk1.8.0_91  jdk-8u91-linux-x64.tar.gz  rh

After extracting use alternatives command to install java.

[root@linuxhelp opt]# cd jdk1.8.0_91/
[root@linuxhelp jdk1.8.0_91]# alternatives --install /usr/bin/java java /opt/jdk1.8.0_91/bin/java 2
[root@linuxhelp jdk1.8.0_91]# alternatives --config java
There are 5 programs which provide ' java' .

  Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
   2           /usr/lib/jvm/jre-1.5.0-gcj/bin/java
   3           /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java
   4           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
   5           /opt/jdk1.8.0_91/bin/java

Enter to keep the current selection[+], or type selection number: 5

Setup jar and javac commands path using alternatives.

[root@linuxhelp jdk1.8.0_91]# alternatives --install /usr/bin/jar jar /opt/jdk1.8.0_91/bin/jar 2
[root@linuxhelp jdk1.8.0_91]# alternatives --install /usr/bin/javac javac /opt/jdk1.8.0_91/bin/javac 2
[root@linuxhelp jdk1.8.0_91]# alternatives --set jar /opt/jdk1.8.0_91/bin/jar
[root@linuxhelp jdk1.8.0_91]# alternatives --set javac /opt/jdk1.8.0_91/bin/javac

After java installation, check the java version.

[root@linuxhelp jdk1.8.0_91]# java -version
java version " 1.8.0_91" 
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

Next we need to configure the environment variables as follows.

[root@linuxhelp jdk1.8.0_91]# export JAVA_HOME=/opt/jdk1.8.0_91/
[root@linuxhelp jdk1.8.0_91]# export JRE_HOME=/opt/jdk1.8.0_91/jre
[root@linuxhelp jdk1.8.0_91]# export PATH=$PATH:/opt/jdk1.8.0_91/bin:/opt/jdk1.8.0_91/jre/bin/

To install Hadoop

Create a user for hadoop as follows.

[root@linuxhelp ~]# useradd hadoop
[root@linuxhelp ~]# passwd hadoop
Changing password for user hadoop.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.

After creating a user account for hadoop, switch to hadoop user account and generate the ssh key file as follows.

We need to install hadoop from the hadoop user account only. All the configuration shown below made only in hadoop user account.

[root@linuxhelp ~]# su - hadoop
[hadoop@linuxhelp ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory ' /home/hadoop/.ssh' .
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
9e:a1:31:11:21:70:32:d3:ba:52:44:f1:76:b8:b6:c4 hadoop@linuxhelp.com
The key' s randomart image is:
+--[ RSA 2048]----+
| .B+o o.         |
|  .*.o .         |
| . .+ o          |
|  oo o .         |
| . .E o S        |
|. .o . = o       |
| .  . . o        |
|                 |
|                 |
+-----------------+
[hadoop@linuxhelp ~]$ cat ~/.ssh/id_rsa.pub > >  ~/.ssh/authorized_keys
[hadoop@linuxhelp ~]$ chmod 0600 ~/.ssh/authorized_keys

Navigate to the link to download hadoop. Choose the version accordingly.,

http://apache.claz.org/hadoop/common/

[hadoop@linuxhelp ~]$ cd
[hadoop@linuxhelp ~]$ pwd
/home/hadoop
[hadoop@linuxhelp ~]$ wget http://apache.claz.org/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
[hadoop@linuxhelp ~]$ tar -xzf hadoop-2.7.2.tar.gz 
[hadoop@linuxhelp ~]$ ls
hadoop-2.7.2  hadoop-2.7.2.tar.gz
[hadoop@linuxhelp ~]$ mv hadoop-2.7.2 hadoop
[hadoop@linuxhelp ~]$ ls
hadoop  hadoop-2.7.2.tar.gz

Edit the .bashrc file as follows, which is located in/home/hadoop directory.

In your root directory also .bashrc file is available. Do not edit this file.

[hadoop@linuxhelp ~]$ vim .bashrc

export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Run the following command to apply the changes.

[hadoop@linuxhelp ~]$ source ~/.bashrc

Edit the /home/hadoop/etc/hadoop-env.sh file and add the below mentioned line to export JAVA_HOME

[hadoop@linuxhelp ~]$ cd hadoop/etc/hadoop/
[hadoop@linuxhelp hadoop]$ ls
capacity-scheduler.xml  hadoop-env.sh               httpfs-env.sh            kms-env.sh            mapred-env.sh               ssl-server.xml.example
configuration.xsl       hadoop-metrics2.properties  httpfs-log4j.properties  kms-log4j.properties  mapred-queues.xml.template  yarn-env.cmd
container-executor.cfg  hadoop-metrics.properties   httpfs-signature.secret  kms-site.xml          mapred-site.xml.template    yarn-env.sh
core-site.xml           hadoop-policy.xml           httpfs-site.xml          log4j.properties      slaves                      yarn-site.xml
hadoop-env.cmd          hdfs-site.xml               kms-acls.xml             mapred-env.cmd        ssl-client.xml.example
[hadoop@linuxhelp hadoop]$ vim hadoop-env.sh

Comment the following line.

export JAVA_HOME=${JAVA_-HOME}

And add the following line into it.

export JAVA_HOME=/opt/jdk1.7.0_91/

Now start editing the configuration file for hadoop for single node cluster in the same directory.

[hadoop@linuxhelp hadoop]$ vim core-site.xml

Add the following entries into it.

fs.default.name
hdfs://localhost:9000

[hadoop@linuxhelp hadoop]$ vim hdfs-site.xml

Add the following entries into it.

dfs.replication
1

dfs.name.dir
file:///home/hadoop/hadoopdata/hdfs/namenode

dfs.data.dir
file:///home/hadoop/hadoopdata/hdfs/datanode

[hadoop@linuxhelp hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@linuxhelp hadoop]$ vim mapred-site.xml

Add the following entries into it.

mapreduce.framework.name
yarn

[hadoop@linuxhelp hadoop]$ vim yarn-site.xml

Add the following entries into it.

yarn.nodemanager.aux-services
mapreduce_shuffle

Now format the namenode by using following command.

[hadoop@linuxhelp hadoop]$ hdfs namenode -format
16/06/21 15:32:13 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = linuxhelp.com/192.168.5.103
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.2
STARTUP_MSG:   classpath = /home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-digester-
.
.
.
16/06/21 15:32:16 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2052947587-192.168.5.103-1466503336749
16/06/21 15:32:17 INFO common.Storage: Storage directory /home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted.
16/06/21 15:32:17 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid > = 0
16/06/21 15:32:17 INFO util.ExitUtil: Exiting with status 0
16/06/21 15:32:17 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at linuxhelp.com/192.168.5.103
************************************************************/

Start the service by executing the start-dfs.sh and start-yarn.sh script files located in the directory /home/hadoop/hadoop/sbin as follows.

[hadoop@linuxhelp hadoop]$ cd /home/hadoop/hadoop/sbin/
[hadoop@linuxhelp sbin]$ start-dfs.sh
16/06/21 15:35:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
The authenticity of host ' localhost (::1)'  can' t be established.
RSA key fingerprint is e4:e3:9b:a5:f3:b9:4a:01:a7:93:30:6e:fd:8f:f9:95.
Are you sure you want to continue connecting (yes/no)? yes
.
.
.
Enter passphrase for key ' /home/hadoop/.ssh/id_rsa' : 
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-linuxhelp.com.out
16/06/21 15:36:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[hadoop@linuxhelp sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-linuxhelp.com.out
Enter passphrase for key ' /home/hadoop/.ssh/id_rsa' : 
localhost: starting nodemanager, logging to /home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-linuxhelp.com.out

To check started service using “ jps” command

Now exit form hadoop user account and switch to root account. Run the below command to install jps command for java.

[root@linuxhelp ~]# update-alternatives --install /usr/bin/jps jps /opt/jdk1.8.0_91/jps 1

Run the jps command to verify the started services.

[root@linuxhelp ~]# jps
13376 NameNode
14184 NodeManager
13707 SecondaryNameNode
13515 DataNode
14495 Jps
13887 ResourceManager

Now hadoop is configured and services are running. Open your browser and type http://localhost:50070

Enter http://localhost:8088 for cluster information in the browser.

Type http://localhost:50090 for Secondary Namenode information.


Tag : Hadoop
Comment
fancy
Sep 12 2018
$ hadoop/sbin/start-all.sh WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds. WARNING: This is not a recommended production deployment configuration. WARNING: Use CTRL-C to abort. Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [hadoop10] hadoop10: ssh: connect to host hadoop10 port 22: Connection timed out 2018-09-13 12:45:04,783 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Add a comment
FAQ
Q
How to Create a user for hadoop ?
A
Create a user for hadoop as follows.

useradd hadoop
passwd hadoop

fter creating a user account for hadoop, switch to hadoop user account
Q
How to check the java version ?
A
In order to check the java version use the following command

java -version
Q
How to use alternatives command to install java ?
A
Following is the command to use alternative command to install Java

alternatives --install /usr/bin/java java /opt/jdk1.8.0_91/bin/java 2
Q
What does /etc /init.d do?
A
/etc /init.d specifies where daemons (services) are placed or to see the status of these daemons. It is very LINUX specific, and nothing to do with Hadoop.
Q
What if a Namenode has no data ?
A
if a Namenode has no data It cannot be part of the Hadoop cluster.