Setting Up a Data Streaming Pipeline with Dataflow

45 minutes
  • 5 Learning Objectives

About this Hands-on Lab

One of the primary benefits of Dataflow is that it can handle both streaming and batch data processing in a serverless, fast, and cost-effective manner. In this hands-on lab, you’ll establish the necessary infrastructure — including a Cloud Storage bucket, a Pub/Sub topic, and a BiqQuery dataset — to execute a Dataflow template on real-time streaming data from New York City’s ever-busy taxi service.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Enable the Necessary APIs

Enable the Dataflow, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, and Cloud Resource Manager APIs, either through the user interface or the Cloud Shell.

Create a Storage Bucket

Create a Cloud Storage bucket to hold the temporary Dataflow data.

Create a Dataset and Table

Create a BigQuery dataset and table with the proper schema to hold the dataset-generated data.

Run a Pub/Sub to BigQuery Dataflow Job

Use the Pub/Sub to BigQuery Dataflow template to process the data.

Query the Resulting Dataset

Input and run the desired queries.

Additional Resources

Your company has a new client, New York City. You have been tasked with the responsibility of establishing a way to analyze ongoing data from the city’s taxi cabs. You decide to use a Dataflow template to stream the data into a Pub/Sub topic and output, properly formatted, to a BigQuery dataset.

To accomplish this task, you’ll need to complete the following steps:

  1. Enable the necessary APIs.
  2. Create a storage bucket.
  3. Create a dataset and table.
  4. Run a Pub/Sub to BigQuery Dataflow job.
  5. Query the resulting dataset.

Use the following schema for your BigQuery dataset table:

ride_id:string,point_idx:integer,latitude:float,longitude:float,timestamp:timestamp,meter_reading:float,meter_increment:float,ride_status:string,passenger_count:integer

In the dataset template, set the Pub/Sub topic to:

projects/pubsub-public-data/topics/taxirides-realtime

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?