Scikit-learn is a great place to start working with machine learning. In this lab, we will use scikit-learn to create a Random Forest Classifier to determine if you prefer cats or dogs. The data set being used is entirely made up, but could easily be swapped with one of your own!
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Navigate to the Jupyter Notebook
Log in to the AWS console and navigate to the AWS SageMaker page. From there, load the Jupyter Notebook that has been provided with this hands-on lab.
- Load and Prepare the Data
- Load the survey data from
data.csv
, located beside the notebook. - View the data. Look at both the raw data and statistics for the data.
- Change the column data types so the model can understand them.
- Split the data into training and testing sets. Use 80% of the data for training.
- Load the survey data from
- Train the Random Forest Model
- Create a Random Forest Classifier model using scikit-learn.
- Train the model using the training data.
- Evaluate the Model
- Generate predictions for the testing data set.
- View the confusion matrix for the predictions.
- Calculate the sensitivity and specificity.
- Plot the ROC curve.
- Calculate the area under the curve.
- Predict for Yourself
- Create a survey response for yourself.
- Have the model predict if you prefer cats or dogs.