Exploring AML Designer Transforms: Clip Values

30 minutes
  • 3 Learning Objectives

About this Hands-on Lab

A large amount of time for machine learning tasks is spent upfront understanding the data and getting it into the proper configuration to train the model. This is the Data Wrangling, Exploration, and Cleaning phase of the machine learning lifecycle. In Azure Machine Learning Designer, many common data changing operations are provided as transform modules.

In this lab, you will explore the `Clip Values` module to gain a deeper understanding of the tools at your disposal.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Set Up the Workspace

Log in and go to the Azure Machine Learning Studio workspace provided in the lab.

Create a Training Cluster of D2 instances.

Create a new blank Pipeline in the Azure Machine Learning Studio Designer.

Explore Clip Values

Add an Automobile price data (Raw) node to the canvas. Visualize the data and look at the length feature. Note the total number of values in this column, as well as how many are missing.

The top and bottom 3 values in the length feature have been determined to be outliers. Using a single Clip Values node, remove these values from the dataset. Do not add any additional columns to the dataset.

Submit the Pipeline to perform the transformation.

Visualize the Transformed Data

When the pipeline finishes, inspect the output of the Clip Values node. How many rows are now missing data?

Additional Resources

Occasionally, you find outliers in your data that are not representative of reality. We don't want to train models with non-real data because models predict based on what they have seen. Garbage in, garbage out. Imaginary data in, imaginary predictions out. One way of eliminating these is by removing the outlier data. Azure Machine Learning Designer provides the Clip Values module to trim and manipulate the edges of your datasets.

Let's assume that through testing and mathematics, we have determined that the length feature of the Automobile price data dataset contains outliers. We will remove these outliers using Clip Values.

Lab Goals

  1. Set up the Workspace
  2. Explore Clip Values

Logging in to the lab environment

To avoid issues with the lab, use a new Incognito or Private browser window to log in to the lab. This ensures that your personal account credentials, which may be active in your main window, are not used for the lab.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?