- Lab
- A Cloud Guru
Classification Options in Azure Machine Learning
Machine Learning Studio provides many classification algorithm modules, both for binary, and multi-class classification tasks. With all of the options, how will you know which is best suited for your dataset? How do you compare the performance between very different models? In this lab, we will experiment with different classification algorithms to find the best model for our data.
Path Info
Table of Contents
-
Challenge
Setup the Workspace
-
Log in and go to the Azure Machine Learning Studio workspace provided in the lab.
-
Create a Training Cluster of
D2
instances. Set the max cluster nodes to 2. You will need a lot of compute in this lab. -
Create a new blank
Pipeline
in the Azure Machine Learning Studio Designer.
-
-
Challenge
Train Multiple Classification Models
-
The necessary data is in the
CRM Dataset Shared
andCRM Upselling Labels Shared
dataset nodes. -
Change all missing values in
CRM Dataset Shared
to zeroes. -
Join the cleaned data with the labels found in
CRM Upselling Labels Shared
. -
Split the data into training and testing datasets, with 70% of the data being used for training. Be sure to use a random seed for repeatable results.
-
Set up models using each of the following machine learning algorithms. Use the default algorithm options, except as noted:
- Two-Class Support Vector Machine
- Two-Class Logistic Regression
- Two-Class Boosted Decision Tree
- Two-Class Decision Forest
- Two-Class Averaged Perceptron
- Set Learning rate to
0.1
- Set Learning rate to
- Two-Class Neural Network
- Set Number of hidden nodes to
50
- Set Number of learning iterations to
25
- Set Number of hidden nodes to
-
Train each of those models with
Col1
(from the Upselling Labels) as the label. Make sure to use the training data, not the testing data. -
Generate predictions on the testing data.
-
Generate statistics from the predictions.
-
Submit the pipeline.
Note: Due to the large number of operations in this pipeline, it will take 10-15 minutes to complete. Grab a coffee or watch another lecture video while you wait.**
-
-
Challenge
Compare the Classification Models
- Right-click the CRM Upselling Labels Shared module, and select Visualize.
- Right-click the Apply SQL Transformation module, and select Visualize Result dataset.
- Right-click the first Evaluate Model module, and select Visualize Evaluation results.
- Take note of the
AUC
value.
- Take note of the
- Right-click the second Evaluate Model module, and select Visualize Evaluation results.
- Take note of the
AUC
value. Is the value lower or higher than the first model?
- Take note of the
- Right-click the final Evaluate Model module, and select Visualize Evaluation results.
- Take note of the
AUC
value. Is the value or higher than the other models?
- Take note of the
- Select the model with the highest
AUC
value.
-
Challenge
Determine the Optimal Threshold
- Adjust the Threshold Value, and take note of the accuracy metrics.
- Do the results become more, or less accurate, as the threshold value is raised? Lowered?
- Adjust the threshold, so that the results provide the highest true positive value, while keeping both false positive, and false negative values low.
- Adjust the Threshold Value, and take note of the accuracy metrics.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.