A large amount of time for machine learning tasks is spent upfront understanding the data and getting it into the proper configuration to train the model. This is the Data Wrangling, Exploration, and Cleaning phase of the machine learning lifecycle. In Azure Machine Learning Designer, many common data changing operations are provided as transform modules.
In this lab, you will explore the `Clip Values` module to gain a deeper understanding of the tools at your disposal.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Set Up the Workspace
Log in and go to the Azure Machine Learning Studio workspace provided in the lab.
Create a Training Cluster of
D2
instances.Create a new blank Pipeline in the Azure Machine Learning Studio Designer.
- Explore Clip Values
Add an Automobile price data (Raw) node to the canvas. Visualize the data and look at the
length
feature. Note the total number of values in this column, as well as how many are missing.The top and bottom 3 values in the length feature have been determined to be outliers. Using a single Clip Values node, remove these values from the dataset. Do not add any additional columns to the dataset.
Submit the Pipeline to perform the transformation.
- Visualize the Transformed Data
When the pipeline finishes, inspect the output of the Clip Values node. How many rows are now missing data?