Lab
A Cloud Guru

Setting Up a Data Streaming Pipeline with Dataflow

One of the primary benefits of Dataflow is that it can handle both streaming and batch data processing in a serverless, fast, and cost-effective manner. In this hands-on lab, you’ll establish the necessary infrastructure — including a Cloud Storage bucket, a Pub/Sub topic, and a BiqQuery dataset — to execute a Dataflow template on real-time streaming data from New York City’s ever-busy taxi service.

Try for free Contact sales

Path Info

Level

Intermediate

Duration

45m

Published

Jun 23, 2022

Challenge

Enable the Necessary APIs

Enable the Dataflow, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, and Cloud Resource Manager APIs, either through the user interface or the Cloud Shell.
Challenge

Create a Storage Bucket

Create a Cloud Storage bucket to hold the temporary Dataflow data.
Challenge

Create a Dataset and Table

Create a BigQuery dataset and table with the proper schema to hold the dataset-generated data.
Challenge

Run a Pub/Sub to BigQuery Dataflow Job

Use the Pub/Sub to BigQuery Dataflow template to process the data.
Challenge

Query the Resulting Dataset

Input and run the desired queries.

Author

A Cloud Guru

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.