Exploring AML Designer Transforms: Join Data

30 minutes
  • 3 Learning Objectives

About this Hands-on Lab

A large amount of time for machine learning tasks is spent understanding the data and getting it into the proper configuration to train the model. This is the Data Wrangling, Exploration, and Cleaning phase of the machine learning lifecycle. In Azure Machine Learning designer, many common data changing operations are provided as transform modules. In this lab, you will explore the Join Data module to gain a deeper understanding of the tools at your disposal.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Set Up the Workspace

Log in and go to the Azure Machine Learning Studio workspace provided in the lab.

Create a Training Cluster of D2 instances.

Create a new blank Pipeline in the Azure Machine Learning Studio Designer.

Explore Join Data

Add IMDB Movie Titles and Movie Ratings dataset nodes to the canvas. Visualize these datasets to see if they have any common data. Note, these columns might not be named exactly the same in both datasets.

Using a Join Data transform node, combine the datasets on their shared data. This will be an Inner Join operation. Remove one of the columns containing the shared data.

Submit the Pipeline to perform the transformation.

Visualize the Transformed Data

When the pipeline finishes, inspect the output of the Join Data node. Can you tell what movie is being reviewed now? Was the duplicate ID column removed successfully?

Additional Resources

When using data exported from a relational database, you will commonly have multiple datasets representing multiple database tables that share some common ID. However, to train a model on all of the features in these tables, we need a single dataset, which means we need a way to combine the datasets. Azure Machine Learning Designer provides the Join Data module to put together two datasets based on their shared ID column. As the name suggests, this module produces the equivalent of a basic SQL JOIN operation.

Lab Goals

  1. Set up the Workspace
  2. Explore Join Data

Logging in to the lab environment

To avoid issues with the lab, use a new Incognito or Private browser window to log in to the lab. This ensures that your personal account credentials, which may be active in your main window, are not used for the lab.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?