Exploring AML Designer Transforms: Clean Missing Data

15 minutes
  • 3 Learning Objectives

About this Hands-on Lab

A large amount of time for machine learning tasks is spent understanding the data and getting it into the proper configuration to train the model. This is the Data Wrangling, Exploration, and Cleaning phase of the machine learning lifecycle. In Azure Machine Learning Designer, many common data changing operations are provided as transform modules.

In this lab, you will explore the `Clean Missing Data` module to gain a deeper understanding of the tools at your disposal.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Set Up the Workspace

Log in and go to the Azure Machine Learning Studio workspace provided in the lab.

Create a Training Cluster of D2 instances.

Create a new blank Pipeline in the Azure Machine Learning Studio Designer.

Explore Clean Missing Data

Add an Adult Census Income Binary Classification dataset node to the pipeline. Visualize this raw data to see what data is missing.

Find a column that is only missing a value in under 5% of the data. You will need to find the total row count and how many values are missing in each column. All of this information is provided on the Visualize popup.

Using a Clean Missing Data transformation, remove the rows which are missing data in the chosen column.

Submit the Pipeline to perform the transformation.

Visualize the Transformed Data

When the pipeline finishes, inspect the output of the Clean Missing Data node. How have the column statistics changed?

You can continue to chain the Clean Missing Data nodes to clean other columns. You can also select multiple columns to clean at the same time if you want to apply the same operation with the same threshold values on those columns.

Additional Resources

Handling missing data is a common task during data preparation. It'd be great if all datasets had total coverage for all features, but that's not realistic. How you handle the missing data will depend on what kind of data is missing, how important the information is, and what percentage of the data is missing. Machine Learning Studio provides the Clean Missing Data module, which has many options to help clean up this messy data.

Lab Goals

  1. Set up the Workspace
  2. Explore Clean Missing Data
  3. Visualize the Transformed Data

Logging in to the lab environment

To avoid issues with the lab, use a new Incognito or Private browser window to log in to the lab. This ensures that your personal account credentials, which may be active in your main window, are not used for the lab.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Get Started
Who’s going to be learning?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$2,495.00

Checkout
Sign In
Welcome Back!
Thanks for reaching out!

You’ll hear from us shortly. In the meantime, why not check out what our customers have to say about ACG?