A large amount of time for machine learning tasks is spent understanding the data and getting it into the proper configuration to train the model. This is the Data Wrangling, Exploration, and Cleaning phase of the machine learning lifecycle. In Azure Machine Learning designer, many common data changing operations are provided as transform modules. In this lab, you will explore the Apply Math Operation module to gain a deeper understanding of the tools at your disposal.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Set Up the Workspace
- Log in and navigate to the Azure Machine Learning studio workspace provided in the lab.
- In Compute, create a Training Cluster:
- Provide a unique name for the cluster.
- Use virtual machine size
Standard_D2_v2
. - For minimum and maximum nodes, input
2
.
- In Designer, start a new pipeline from the easy-to-use prebuilt modules:
- Select the Training Cluster you created as the default compute for this pipeline.
- Provide a unique name for your pipeline.
- Explore Apply Math Operation
- From the Datasets submenu on the left, drag a
CRM Upselling Labels Shared
node onto the canvas. Visualize the dataset. - From the Data Transformations submenu on the left, drag an
Apply Math Operation
node onto the canvas. - Connect the output of CRM
Upselling Labels Shared
to the input ofApply Math Operation
. - Click the Apply Math Operation node to configure it:
- Category:
Compare
- Comparison function:
PairMax
- Value to compare type:
Constant
- Second argument:
0
With PairMax, this will take the maximum value of our provided constant0
or the column value. For current1
values, it will choose1
. For the current-1
values, it will choose0
. - On Column set, click Edit column, then select
Col1
and Save. - Change Output mode to
Inplace
. This will replace the value in the column without adding another column that we’d have to manage after the operation.
- Category:
- Select Submit to submit the pipeline, creating a new experiment.
- Once the pipeline completes, right-click the Apply Math Operation node and choose
Visualize Result_dataset
.- There are still 50,000 rows and 1 column, but the values that were
-1
are now0
.
- There are still 50,000 rows and 1 column, but the values that were
- From the Datasets submenu on the left, drag a