Deploy and Configure a Single-Node Hadoop Cluster

2 hours
  • 10 Learning Objectives

About this Hands-on Lab

Many cloud platforms and third-party service providers offer Hadoop as a service or VM/container image. This lowers the barrier of entry for those wishing to get started with Hadoop. In this hands-on lab, you will have the opportunity to deploy a single-node Hadoop cluster in a pseudo-distributed configuration. Doing so demonstrates the deployment and configuration of each individual component of Hadoop that will get you ready for when you want to start working with a multi-node cluster to separate and cluster Hadoop services. In this learning activity, you will be performing the following:

* Installing Java
* Deploying Hadoop from an archive file
* Configuring Hadoop’s `JAVA_HOME`
* Configuring the default filesystem for Hadoop
* Configuring HDFS replication
* Setting up passwordless SSH
* Formatting the Hadoop Distributed File System (HDFS)
* Starting Hadoop
* Creating files and directories in Hadoop
* Examining a text file with a MapReduce job

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Install Java

Log into Node 1 as cloud_user and install the java-19-amazon-corretto-devel package:

sudo yum -y install java-19-amazon-corretto-devel
Deploy Hadoop

From the cloud_user home directory, download Hadoop-2.9.2 from your desired mirror. You can view a list of mirrors here:

curl -O http://mirrors.gigenet.com/apache/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz

Unpack the archive in place:

tar -xzf hadoop-3.3.4.tar.gz

Delete the archive file:

rm hadoop-3.3.4.tar.gz

Rename the installation directory:

mv hadoop-3.3.4/ hadoop/
Configure java_home

From /home/cloud_user/hadoop, set JAVA_HOME in etc/hadoop/hadoop-env.sh by changing the following line:

export JAVA_HOME=${JAVA_HOME}

Change it to this:

export JAVA_HOME=/usr/lib/jvm/java-19-amazon-corretto/

Save and close the file.

Configure Core Hadoop

Set the default filesystem to hdfs on localhost in /home/cloud_user/hadoop/etc/hadoop/core-site.xml by changing the following lines:

<configuration>
</configuration>

Change them to this:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Save and close the file.

Configure HDFS

Set the default block replication to 1 in /home/cloud_user/hadoop/etc/hadoop/hdfs-site.xml by changing the following lines:

<configuration>
</configuration>

Change them to this:

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

Save and close the file.

Set Up Passwordless SSH Access to localhost

As cloud_user, generate a public/private RSA key pair with:

ssh-keygen

The default option for each prompt will suffice.

Add your newly generated public key to your authorized keys list with:

cat ~/.ssh/id_rsa.pub >>  ~/.ssh/authorized_keys
Format the Filesystem

From /home/cloud_user/hadoop/, format the DFS with:

bin/hdfs namenode -format
Start Hadoop

Start the NameNode and DataNode daemons from /home/cloud_user/hadoop with:

sbin/start-dfs.sh
Download and Copy the Latin Text to Hadoop

From /home/cloud_user/hadoop, download the latin.txt file with:

curl -O https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt

From /home/cloud_user/hadoop, create the /user and /user/root directories in Hadoop with:

bin/hdfs dfs -mkdir -p /user/cloud_user

From /home/cloud_user/hadoop/, copy the latin.txt file to Hadoop at /user/cloud_user/latin with:

bin/hdfs dfs -put latin.txt latin
Examine the latin.txt Text with MapReduce

From /home/cloud_user/hadoop/, use the hadoop-mapreduce-examples-*.jar to calculate the average length of the words in the /user/cloud_user/latin file and save the job output to /user/cloud_user/latin_wordmean_output in Hadoop with:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar wordmean latin latin_wordmean_output

From /home/cloud_user/hadoop/, examine your wordmean job output files with:

bin/hdfs dfs -cat latin_wordmean_output/*

Additional Resources

As a data engineer for a small company that provides data platforms and analytics services, you've been tasked with installing and configuring a single-node Hadoop cluster. This will be used by your customer to perform language analysis.

For this job, you have been given a bare CentOS 7 cloud server. On it, you will deploy and configure Hadoop in the cloud_user home directory at /home/cloud_user/hadoop. The default filesystem should be set to hdfs://localhost:9000 to facilitate a pseudo-distributed operation. Because it will be a single-node cluster, we must configure the dfs.replication to 1.

After you have deployed, configured, and started Hadoop, you must format and prepare Hadoop to execute a MapReduce job. Specifically, you must download some Latin text from the customer at https://raw.githubusercontent.com/linuxacademy/content-hadoop-quick-start/master/latin.txt and use the hadoop-mapreduce-examples-*.jar application that ships with Hadoop to determine the average length of the words in the file. The latin.txt file should be copied to Hadoop at /user/cloud_user/latin, and the output of the MapReduce job should be written to /user/cloud_user/latin_wordmean_output.

Note: You can execute the hadoop-mapreduce-examples-3.3.4.jar application without arguments to get usage information if you aren't sure which class and class arguments to use.

Important: Don't forget that you'll need to install Java, configure JAVA_HOME for Hadoop, and set up passwordless SSH to localhost before attempting to start Hadoop.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?