Deploy and Configure a Single-Node Hadoop Cluster

2 hours
  • 10 Learning Objectives

About this Hands-on Lab

Many cloud platforms and third-party service providers offer Hadoop as a service or VM/container image. This lowers the barrier of entry for those wishing to get started with Hadoop. In this hands-on lab, we will have the opportunity to deploy a single-node Hadoop cluster in a pseudo-distributed configuration. Doing so demonstrates the deployment and configuration of each individual component of Hadoop, getting us ready for when we want to start working with a multi-node cluster to separate and cluster Hadoop services. In this learning activity, we will be performing the following:

* Installing Java
* Deploying Hadoop from an archive file
* Configuring Hadoop’s JAVA_HOME
* Configuring the default filesystem for Hadoop
* Configuring HDFS replication
* Setting up passwordless SSH
* Formatting the Hadoop Distributed File System (HDFS)
* Starting Hadoop
* Creating files and directories in Hadoop
* Examining a text file with a MapReduce job

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Install Java

Log into Node 1 as cloud_user and install the java-1.8.0-openjdk package:

sudo yum install java-1.8.0-openjdk -y
Deploy Hadoop

From the cloud_user home directory, download Hadoop-2.9.2 from your desired mirror. You can view a list of mirrors here:

curl -O http://mirrors.gigenet.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

Unpack the archive in place:

tar -xzf hadoop-3.3.0.tar.gz

Delete the archive file:

rm -rf hadoop-3.3.0.tar.gz

Rename the installation directory:

mv hadoop-3.3.0 hadoop
Configure JAVA_HOME

From /home/cloud_user/hadoop, set JAVA_HOME in etc/hadoop/hadoop-env.sh by changing the following line:

export JAVA_HOME=${JAVA_HOME}

Change it to this:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre

Save and close the file.

Configure Core Hadoop

Set the default filesystem to hdfs on localhost in /home/cloud_user/hadoop/etc/hadoop/core-site.xml by changing the following lines:

<configuration>
</configuration>

Change them to this:

<configuration>
  <property>
    <name>fs.default</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Save and close the file.

Configure HDFS

Set the default block replication to 1 in /home/cloud_user/hadoop/etc/hadoop/hdfs-site.xml by changing the following lines:

<configuration>
</configuration>

Change them to this:

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

Save and close the file.

Set up Passwordless SSH Access to localhost

As cloud_user, generate a public/private RSA key pair with:

ssh-keygen

The default option for each prompt will suffice.

Add your newly generated public key to your authorized keys list with:

cat ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys

Test passwordless SSH to localhost with:

ssh localhost

Add localhost to the list of known hosts by accepting its key with yes.

Exit the SSH to localhost with:

exit
Format the Filesystem

From /home/cloud_user/hadoop/, format the DFS with:

bin/hdfs namenode -format
Start Hadoop

Start the NameNode and DataNode daemons from /home/cloud_user/hadoop with:

sbin/start-dfs.sh

Accept the key for 0.0.0.0 when prompted with yes (this will only need to be done the first time we start Hadoop).

Download and Copy the Latin Text to Hadoop

From /home/cloud_user/hadoop, download the latin.txt file with:

curl -O https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt

From /home/cloud_user/hadoop, create the /user and /user/root directories in Hadoop with:

bin/hdfs dfs -mkdir -p /user/cloud_user

From /home/cloud_user/hadoop/, copy the latin.txt file to Hadoop at /user/cloud_user/latin with:

bin/hdfs dfs -put latin.txt latin
Examine the latin.txt Text with MapReduce

From /home/cloud_user/hadoop/, use the hadoop-mapreduce-examples-2.9.2.jar to calculate the average length of the words in the /user/cloud_user/latin file and save the job output to /user/cloud_user/latin_wordmean_output in Hadoop with:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordmean latin latin_wordmean_output

From /home/cloud_user/hadoop/, examine your wordmean job output files with:

bin/hdfs dfs -cat latin_wordmean_output/*

Additional Resources

As data engineers for a small company that provides data platforms and analytics services, we've been tasked with installing and configuring a single-node Hadoop cluster. This will be used by our customer to perform language analysis.

For this job, we have been given a bare CentOS 7 cloud server. On it, we will deploy and configure Hadoop in the cloud_user home directory at /home/cloud_user/hadoop. The default filesystem should be set to hdfs://localhost:9000 to facilitate a pseudo-distributed operation. Because it will be a single-node cluster, we must configure the dfs.replication to 1.

After we have deployed, configured, and started Hadoop, we must format and prepare Hadoop to execute a MapReduce job. Specifically, we must download some Latin text from the customer at https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt and use the hadoop-mapreduce-examples-3.3.0.jar application that ships with Hadoop to determine the average length of the words in the file. The latin.txt file should be copied to Hadoop at /user/cloud_user/latin, and the output of the MapReduce job should be written to /user/cloud_user/latin_wordmean_output.

Note: We can execute the hadoop-mapreduce-examples-3.3.0.jar application without arguments to get usage information if we aren't sure which class and class arguments to use.

Important: Don't forget that we'll need to install Java, configure JAVA_HOME for Hadoop, and set up passwordless SSH to localhost before attempting to start Hadoop.

Logging In

Use the credentials on the hands-on lab page to get logged into the server.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Get Started
Who’s going to be learning?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$2,495.00

Checkout
Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!