Running a Spark Job on a Dataproc Cluster

45 minutes
  • 3 Learning Objectives

About this Hands-on Lab

Google Cloud Dataproc is a fully managed and highly scalable service capable of running Apache Spark as well as over 30 other open-source tools and frameworks. In this hands-on lab, you’ll provision a Dataproc cluster and then submit a Spark job that calculates and outputs the value of Pi to a high degree.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Enable the Necessary API

Enable the Dataproc API, either through the user interface or the Cloud Shell.

Create a Dataproc Cluster

Provision a Dataproc cluster capable of running the Spark job.

Submit a Spark Job

Submit the Spark job, and evaluate the results.

Additional Resources

You’ve been asked to get familiar with running Apache Spark jobs on Dataproc. As a proof of concept, you decide to use a Spark job that calculates the value of Pi to a variable degree on a Dataproc cluster.

Use the following entry for your Main class or jar:

org.apache.spark.examples.SparkPi

Use the following entry for your Jar files:

file:///usr/lib/spark/examples/jars/spark-examples.jar

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?