Creating a scikit-learn Random Forest Classifier in AWS SageMaker

1 hour
  • 5 Learning Objectives

About this Hands-on Lab

Scikit-learn is a great place to start working with machine learning. In this lab, we will use scikit-learn to create a Random Forest Classifier to determine if you prefer cats or dogs. The data set being used is entirely made up, but could easily be swapped with one of your own!

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Navigate to the Jupyter Notebook

Log in to the AWS console and navigate to the AWS SageMaker page. From there, load the Jupyter Notebook that has been provided with this hands-on lab.

Load and Prepare the Data
  1. Load the survey data from data.csv, located beside the notebook.
  2. View the data. Look at both the raw data and statistics for the data.
  3. Change the column data types so the model can understand them.
  4. Split the data into training and testing sets. Use 80% of the data for training.
Train the Random Forest Model
  1. Create a Random Forest Classifier model using scikit-learn.
  2. Train the model using the training data.
Evaluate the Model
  1. Generate predictions for the testing data set.
  2. View the confusion matrix for the predictions.
  3. Calculate the sensitivity and specificity.
  4. Plot the ROC curve.
  5. Calculate the area under the curve.
Predict for Yourself
  1. Create a survey response for yourself.
  2. Have the model predict if you prefer cats or dogs.

Additional Resources


You run a local pet shop. To help you determine what kind of products to recommend to new customers, you had a company conduct a survey of 199 of your current customers. Your customers were asked what their favorite color is, if they like walking and running, and how many miles they walk in a day. Lastly, and most importantly, they were asked if they prefer cats or dogs. Do any of those answers actually matter for determinig if you are a dog person or a cat person? Let's let the machine decide!

The files used in this lab can be found on GitHub.

Lab Goals

  1. Navigate to the Jupyter Notebook
  2. Load and Prepare the Data
  3. Create the scikit-learn Model
  4. Evaluate the Model
  5. Predict for Yourself

Logging in to the lab environment

To avoid issues with the lab, use a new Incognito or Private browser window to log in to the lab. This ensures that your personal account credentials, which may be active in your main window, are not used for the lab.

Please make sure you are in the us-east-1 (northern Virginia) region when in the AWS console.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?