How to Install Hadoop in Ubuntu
To Install Hadoop in Ubuntu
Hadoop is a Java-based programming framework that supports the storage of large data-sets on a cluster. This article shows the installation process of Hadoop in Ubuntu.
Installation of Hadoop
Initially, you need to update your system with the following command.
root@linuxhelp1:~# apt-get update
Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB]
Hit:2 http://in.archive.ubuntu.com/ubuntu xenial InRelease
Get:3 http://in.archive.ubuntu.com/ubuntu xenial-updates InRelease [95.7 kB]
Get:4 http://in.archive.ubuntu.com/ubuntu xenial-backports InRelease [92.2 kB]
Get:5 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [163 kB]
Get:6 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [426 kB]
Get:7 http://security.ubuntu.com/ubuntu xenial-security/main i386 Packages [159 kB]
.
.
.
Get:20 http://in.archive.ubuntu.com/ubuntu xenial-updates/universe i386 Packages [359 kB]
Get:21 http://in.archive.ubuntu.com/ubuntu xenial-updates/universe Translation-en [131 kB]
Get:22 http://in.archive.ubuntu.com/ubuntu xenial-backports/main amd64 DEP-11 Metadata [195 B]
Get:23 http://in.archive.ubuntu.com/ubuntu xenial-backports/universe amd64 DEP-11 Metadata [200 B]
Fetched 3,371 kB in 18s (181 kB/s)
Reading package lists... Done
Then install OpenJDK, whcih is the default Java Development Kit in Ubuntu machine.
root@linuxhelp1:~# apt-get install default-jdk -y
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
ca-certificates-java default-jdk-headless default-jre default-jre-headless fonts-dejavu-extra java-common libbonobo2-0
libbonobo2-common libgif7 libgnome-2-0 libgnome2-common libgnomevfs2-0 libgnomevfs2-common libice-dev liborbit-2-0
libpthread-stubs0-dev libsm-dev libx11-dev libx11-doc libxau-dev libxcb1-dev libxdmcp-dev libxt-dev openjdk-8-jdk
openjdk-8-jdk-headless openjdk-8-jre openjdk-8-jre-headless x11proto-core-dev x11proto-input-dev x11proto-kb-dev
xorg-sgml-doctools xtrans-dev
Suggested packages:
default-java-plugin libbonobo2-bin desktop-base libgnomevfs2-bin libgnomevfs2-extra gamin | fam gnome-mime-data libice-doc
libsm-doc libxcb-doc libxt-doc openjdk-8-demo openjdk-8-source visualvm icedtea-8-plugin openjdk-8-jre-jamvm fonts-ipafont-gothic
fonts-ipafont-mincho ttf-wqy-microhei | ttf-wqy-zenhei fonts-indic
The following NEW packages will be installed:
ca-certificates-java default-jdk default-jdk-headless default-jre default-jre-headless fonts-dejavu-extra java-common
libbonobo2-0 libbonobo2-common libgif7 libgnome-2-0 libgnome2-common libgnomevfs2-0 libgnomevfs2-common libice-dev liborbit-2-0
libpthread-stubs0-dev libsm-dev libx11-dev libx11-doc libxau-dev libxcb1-dev libxdmcp-dev libxt-dev openjdk-8-jdk
openjdk-8-jdk-headless openjdk-8-jre openjdk-8-jre-headless x11proto-core-dev x11proto-input-dev x11proto-kb-dev
xorg-sgml-doctools xtrans-dev
0 upgraded, 33 newly installed, 0 to remove and 429 not upgraded.
Need to get 41.4 MB of archives.
After this operation, 169 MB of additional disk space will be used.
Get:1 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 libbonobo2-common all 2.32.1-3 [34.7 kB]
Get:2 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 liborbit-2-0 amd64 1:2.14.19-1build1 [140 kB]
Get:3 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 libbonobo2-0 amd64 2.32.1-3 [211 kB]
Get:4 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 java-common all 0.56ubuntu2 [7,742 B]
Get:5 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 default-jre-headless amd64 2:1.8-56ubuntu2 [4,380 B]
Get:6 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 ca-certificates-java all 20160321 [12.9 kB]
Get:7 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openjdk-8-jre-headless amd64 8u111-b14-2ubuntu0.16.04.2 [26.9 MB]
Get:8 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libgif7 amd64 5.1.4-0.3~16.04 [30.5 kB]
Get:9 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 openjdk-8-jre amd64 8u111-b14-2ubuntu0.16.04.2 [69.4 kB]
Get:10 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 default-jre amd64 2:1.8-56ubuntu2 [980 B]
.
.
.
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jdb to provide /usr/bin/jdb (jdb) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/serialver to provide /usr/bin/serialver (serialver) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/wsgen to provide /usr/bin/wsgen (wsgen) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jcmd to provide /usr/bin/jcmd (jcmd) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jarsigner to provide /usr/bin/jarsigner (jarsigner) in auto mode
Setting up default-jdk-headless (2:1.8-56ubuntu2) ...
Setting up openjdk-8-jdk:amd64 (8u111-b14-2ubuntu0.16.04.2) ...
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/appletviewer to provide /usr/bin/appletviewer (appletviewer) in auto mode
update-alternatives: using /usr/lib/jvm/java-8-openjdk-amd64/bin/jconsole to provide /usr/bin/jconsole (jconsole) in auto mode
Setting up default-jdk (2:1.8-56ubuntu2) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Once the installation is completed, use the following command to check the version.
root@linuxhelp1:~# java -version
openjdk version " 1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
Now you need to download the hadoop package.
root@linuxhelp1:~# wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
--2016-11-22 03:07:07-- http://apache.mirrors.tds.net/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
Resolving apache.mirrors.tds.net (apache.mirrors.tds.net)... 216.165.129.134
Connecting to apache.mirrors.tds.net (apache.mirrors.tds.net)|216.165.129.134|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 214092195 (204M) [application/x-gzip]
Saving to: ‘ hadoop-2.7.3.tar.gz’
hadoop-2.7.3.tar.gz 100%[==========================================================> ] 204.17M 87.4KB/s in 31m 30s
2016-11-22 03:38:39 (111 KB/s) - ‘ hadoop-2.7.3.tar.gz’ saved [214092195/214092195]
In order to check the downloaded file using SHA-256.
root@linuxhelp1:~# wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds
--2016-11-22 03:39:20-- https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds
Resolving dist.apache.org (dist.apache.org)... 209.188.14.144
Connecting to dist.apache.org (dist.apache.org)|209.188.14.144|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 958 [application/x-gzip]
Saving to: ‘ hadoop-2.7.3.tar.gz.mds’
hadoop-2.7.3.tar.gz.mds 100%[==========================================================> ] 958 --.-KB/s in 0s
2016-11-22 03:39:21 (42.7 MB/s) - ‘ hadoop-2.7.3.tar.gz.mds’ saved [958/958]
Utilize the following command for verification.
root@linuxhelp1:~# shasum -a 256 hadoop-2.7.3.tar.gz
d489df3808244b906eb38f4d081ba49e50c4603db03efd5e594a1e98b09259c2 hadoop-2.7.3.tar.gz
Here you need to compare the value with the SHA-256 value in the .mds file.
root@linuxhelp1:~# cat hadoop-2.7.3.tar.gz.mds
hadoop-2.7.3.tar.gz: MD5 = 34 55 BB 57 E4 B4 90 6B BE A6 7B 58 CC A7 8F A8
hadoop-2.7.3.tar.gz: SHA1 = B84B 8989 3426 9C68 753E 4E03 6D21 395E 5A4A B5B1
hadoop-2.7.3.tar.gz: RMD160 = 8FE4 A91E 8C67 2A33 C4E9 61FB 607A DBBD 1AE5 E03A
hadoop-2.7.3.tar.gz: SHA224 = 23AB1EAB B7648921 7101671C DCF9D774 7B84AD50
6A74E300 AE6617FA
hadoop-2.7.3.tar.gz: SHA256 = D489DF38 08244B90 6EB38F4D 081BA49E 50C4603D
B03EFD5E 594A1E98 B09259C2
hadoop-2.7.3.tar.gz: SHA384 = EFB42E60 3AF4FFB2 BA9F4CF4 1B56F71B D3F3BD8F
23331C25 27267762 FDEB67F0 F2B6F56D 797842DB
BB8C9F75 9DBA195D
hadoop-2.7.3.tar.gz: SHA512 = 52452D2F 7D0B308F 8BB53ADD B81D98D6 D71F3A7C
F5A0C5D8 311C17DD 902E052C 3F4ADD3F EE3C5EA2
E6C749D3 476E452F ED50818D 11001D87 CFEC039D
9A8BADE5
Extract the downloaded hadoop package.
root@linuxhelp1:~# tar -xvzf hadoop-2.7.3.tar.gz
hadoop-2.7.3/
hadoop-2.7.3/bin/
hadoop-2.7.3/bin/hadoop
hadoop-2.7.3/bin/hadoop.cmd
hadoop-2.7.3/bin/rcc
hadoop-2.7.3/bin/hdfs
hadoop-2.7.3/bin/hdfs.cmd
hadoop-2.7.3/bin/container-executor
hadoop-2.7.3/bin/test-container-executor
hadoop-2.7.3/bin/yarn
hadoop-2.7.3/bin/yarn.cmd
hadoop-2.7.3/bin/mapred
hadoop-2.7.3/bin/mapred.cmd
hadoop-2.7.3/etc/
hadoop-2.7.3/etc/hadoop/
hadoop-2.7.3/etc/hadoop/core-site.xml
hadoop-2.7.3/etc/hadoop/hadoop-env.cmd
hadoop-2.7.3/etc/hadoop/hadoop-metrics2.properties
hadoop-2.7.3/etc/hadoop/hadoop-policy.xml
.
.
.
hadoop-2.7.3/share/hadoop/tools/lib/asm-3.2.jar
hadoop-2.7.3/include/
hadoop-2.7.3/include/hdfs.h
hadoop-2.7.3/include/Pipes.hh
hadoop-2.7.3/include/TemplateFactory.hh
hadoop-2.7.3/include/StringUtils.hh
hadoop-2.7.3/include/SerialUtils.hh
hadoop-2.7.3/LICENSE.txt
hadoop-2.7.3/NOTICE.txt
hadoop-2.7.3/README.txt
Then move the extracted files into /usr/local.
root@linuxhelp1:~# mv hadoop-2.7.3 /usr/local/hadoop
Now configure the java home Hadoop requires that you set the path to Java, either as an environment variable or in the Hadoop configuration file.
root@linuxhelp1:~# readlink -f /usr/bin/java | sed " s:bin/java::"
/usr/lib/jvm/java-8-openjdk-amd64/jre/
Now open the hadoop-env.sh file and add the static value and dynamic value
root@linuxhelp1:~# nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
export JAVA_HOME=$(readlink -f /usr/bin/java | sed " s:bin/java::" )
Now its time to run the hadoop.
root@linuxhelp1:~# /usr/local/hadoop/bin/hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar run a jar file
note: please use " yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
Once the Hadoop is configured successfully, run it in stand-alone mode. Then create a new directory called input in home directory.
Copy the Hadoop' s configuration files into the newly created directory to use those files as our data.
root@linuxhelp1:~# mkdir ~/input root@linuxhelp1:~# cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
Finally run the following command to grep the MapReduce hadoop-mapreduce-examples program and to execute its code.
root@linuxhelp1:~# /usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep ~/input ~/grep_example ' principle[.]*'
16/11/22 03:47:50 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/11/22 03:47:50 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/11/22 03:47:50 INFO input.FileInputFormat: Total input paths to process : 8
16/11/22 03:47:51 INFO mapreduce.JobSubmitter: number of splits:8
16/11/22 03:47:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local586866770_0001
16/11/22 03:47:51 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/11/22 03:47:51 INFO mapreduce.Job: Running job: job_local586866770_0001
16/11/22 03:47:51 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/11/22 03:47:51 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/22 03:47:51 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/11/22 03:47:52 INFO mapred.LocalJobRunner: Waiting for map tasks
16/11/22 03:47:52 INFO mapred.LocalJobRunner: Starting task: attempt_local586866770_0001_m_000000_0
16/11/22 03:47:52 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/11/22 03:47:52 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/11/22 03:47:52 INFO mapred.MapTask: Processing split: file:/home/user1/input/hadoop-policy.xml:0+9683
16/11/22 03:47:52 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
.
.
.
File System Counters
FILE: Number of bytes read=1247198
FILE: Number of bytes written=2317778
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=0
Map output records=0
Map output bytes=0
Map output materialized bytes=6
Input split bytes=115
Combine input records=0
Combine output records=0
Reduce input groups=0
Reduce shuffle bytes=6
Reduce input records=0
Reduce output records=0
Spilled Records=0
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=42
Total committed heap usage (bytes)=264036352
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=98
File Output Format Counters
Bytes Written=8
$ bin/hdfs namenode -format
then start NameNode daemon and DataNode daemon by
$ sbin/start-dfs.sh