In this post, we’ll talk about how Amazon SageMaker’s Linear Learner algorithm is a great place to start if you’re new to machine learning or looking for a hands-on intro to building AWS ML skills.
|This is the first in a series of posts about Amazon SageMaker’s built-in algorithms. This series does assume that you have prior experience with machine learning. If you’d like to learn more, watch our Introduction to Machine Learning course.|
I started my machine learning journey four years ago as a Java software engineering manager wanting to make a transition. I didn’t know where to start, so I turned to Amazon Web Services (AWS) to level-up my skills. Fast forward to present day, I’ve spoken on the AWS re:Invent and TED stages about machine learning, been named an AWS Machine Learning Hero, and earned my AWS Certified Machine Learning – Specialty certification.
Accelerate your career
Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.
Supervised Learning and Binary Classification
The first machine learning problem type I solved was supervised learning with binary classification. I found this use case to be simple and easy to understand. In this use case, you teach the machine to answer simple Yes/No questions by giving it data samples already labeled with the answer you want it to learn how to predict.
Back then, I started with the Amazon Machine Learning service but quickly graduated to Amazon SageMaker. If you’re not familiar with Amazon SageMaker, it’s machine-learning-as-a-service and provides an end-to-end environment for preparing, building, training, and deploying machine learning models.
I found it easy to replicate my Amazon Machine Learning use case using Amazon SageMaker. The built-in Linear Learner algorithm allowed me to be more hands-on using Python, Jupyter Notebooks, and various data science libraries to train my model.
Want to learn more about designing and deploying machine learning solutions on AWS? A Cloud Guru’s AWS Machine Learning learning path offers custom courses fit for beginners and advanced gurus!
Linear Learner Algorithm
The beauty of Amazon SageMaker is that it comes with several built-in algorithms that can be applied across several problem types.
|Learning Type||Problem Type||Built-in algorithms|
K-Nearest Nearest Neighbors (KNN)
|Supervised, RNN||Time-series forecasting||DeepAR|
|Unsupervised||Dimensionality reduction||Principal Component Analysis (PCA)|
|Unsupervised||Anomaly detection||Random Cut Forest (RCF)|
|Unsupervised||IP anomaly detection||IP Insights|
|Unsupervised||Topic modeling||Latent Dirichlet Allocation (LDA) |
Neural Topic Model (NTM)
|Textual Analysis||Text classification||BlazingText|
|Textual Analysis||Machine translation|
|Image Processing||Image and multi-label||Image Classification|
|Image Processing||Object detection and classification||Object Detection|
|Image Processing||Computer Vision||Semantic Segmentation|
Linear Learner worked perfectly for my use case because it solves binary classification problems (among others).
Problem Types Solved by Linear Learner
There are several problem types solved by the Linear Learner algorithm.
|Logistic regression||Binary classification||Answers Yes/No questions by predicting a 0 or 1||Is this email spam or not?|
Is this transaction fraudulent or not?
Is crime likely or not?
|Multinomial logistic regression||Multi-class classification||Answers 1 of many questions by predicting 0 to n-1 classes||Is this item a book, movie, or toy?|
Is this animal a dog, bird, or cat?
|Linear regression||Regression||Answers continuous numeric value questions||What will the temperature be in Atlanta tomorrow?|
How many units of this product will sell?
What will this house sell for?
How it works
For training, Linear Learner requires a data matrix with rows that represent the observations and columns representing the features. One column in the data matrix should represent the label that you want the machine to learn how to predict. For SageMaker’s Linear Learner, the label should appear in the first column of the data matrix and column headers should be excluded.
For my use case, I stored the training data in Amazon S3 and specified the input bucket for the source of the training data and the output bucket to hold the final model artifacts. Linear Learner supports training data in RecordIO (protobuf) or CSV formats, and accepts inference requests in JSON, CSV, or RecordIO (protobuf). For my initial use case, I used CSV. There are two modes for loading the S3 data to the Amazon SageMaker hosted Jupyter notebook instance:
pipe. While I used file mode initially,
pipe mode is more efficient because it reduces training time and saves money by streaming your data, instead of storing the full training dataset on disk like
When configuring your training job, there are several hyperparameters. I’ve listed below a few of the more important hyperparameters for model training. The full list of hyperparameters for the Linear Learner algorithm can be found in the Amazon SageMaker Developer Guide.
|This indicates the target variable. For binary classification, I selected |
|The number of passes over the data.||A positive integer with the default value being 15.|
|The number of features in the input data.|
|The L1 regularization value.|
|The L2 regularization value.|
|The optimization algorithm.|
|The step size for the optimizer.|
|The loss function.||This varies based on the selected |
The default value for auto is
|The number of observations per batch.||positive integer; default value is 1000|
|Linear Learner trains multiple models in parallel. This allows you to set the number of models trained and compared against each other.|
When you are ready to train your machine learning model, a single or multi-machine CPU and GPU instances. It is important to note that Linear Learner doesn’t support incremental training; instead, it uses distributed training.
Linear Learner reports several metrics to help you evaluate and tune your model before releasing it to production. Each metric is reported as both a test and validation metric.
- Objective Loss – This represents the mean value of the loss function. For my use case, the loss is logistic loss.
- Accuracy – This represents the number of correct true positives and true negatives.
- Precision – This represents all of the predicted positives – the percentage that is actually positive.
- Recall – This represents all of the actual positives – the percent that was predicted correctly.
- F1 Score – This is the balance of precision and recall.
Watch: What to Know Before re:Invent 2021
It’s time for the most exciting extravaganza of the year: re:Invent 2021. Join us Wednesday, Nov. 17, and grab a seat at the poker table with our panel of AWS Heroes as they place their bets on what announcements will be made at this year’s conference.
Linear Learner in Action
Now let’s see a real-world example. The aim of my use case is to use SageMaker’s Linear Learner algorithm to train a linear model for crime prediction. For this illustration, stop-and-search crime data was pulled from the data.police.uk dataset available at https://data.police.uk/data/. data.police.uk is a site for open data about crime and policing in England, Wales, and Northern Ireland.
The purpose here is to use this dataset to build a predictive policing model to determine whether or not crime is likely given the following data points:
- Time of Day
- Day of Week
The model returns a ‘Crime’ or ‘No Crime’ prediction based on the input provided. The sample code for the use case is freely available. To start the process, I launched an Amazon SageMaker hosted Juptyer notebook.
I imported the necessary data science and SageMaker Python libraries in the notebook.
Next, I read the dataset from the online URL into memory, for preprocessing prior to training.
Data Inspection and Visualization
Once the dataset is imported, it’s typical as part of the machine learning process to inspect the data, understand the distributions, and determine what type(s) of preprocessing might be needed.
I inspected the first few rows of the data.
I used a histogram to understand distributions.
I analyzed the crime count across counties.
I used a crosstab to understand the distributions across gender.
After visualizing and understanding the data, I removed null or bad values. In this stage of my learning, I did not consider any data imputation techniques.
I uncovered that the gender and average age fields had several observations that should be removed from the dataset.
Get the Cloud Dictionary of Pain
Speaking cloud doesn’t have to be hard. We analyzed millions of responses to ID the top concepts that trip people up. Grab this cloud guide for succinct definitions of some of the most painful cloud terms.
Data Encoding and Transformation
Before training, I converted the categorical features into numeric features since the classifiers only work with numeric values.
I converted “day of week” to its numerical representation.
I converted Gender to a numerical representation.
Splitting into Training, Validation, and Test Sets
To prevent model overfitting and to allow me to test the model’s accuracy on data that it hadn’t seen yet, I split the dataset into training, validation, and test sets.
Training the Linear Model
After I loaded the cleaned up training data to S3, I trained the model using Linear Learner. The first step is to set the container image.
Then, I set up the necessary hyperparameters and the bucket location for the final model artifact.
During each epoch, the evaluation metrics are logged. I used this to determine how well the model was performing at each pass over the dataset. These scores helped me tune my model for better performance.
Once I had a trained model, I hosted it on Amazon SageMaker so other users and applications could access it using the
Delete Model Endpoint
When I was finished using my model for predictions, I deleted it to make sure that I would no longer be charged for it.
Learn more about machine learning
Amazon SageMaker’s Linear Learner algorithm is a great place to start if you’re new to machine learning. I found answering simple Yes/No questions to be the easiest use case.
In the next post in this series, I’ll be reviewing the K-Means built-in algorithm that is used for clustering and finding discrete groupings within data.
Looking to learn more about machine learning? Check out ACG’s Introduction to Machine Learning and AWS Certified Machine Learning – Specialty certification courses.
Want to keep up with all things cloud? Subscribe to A Cloud Guru on YouTube for weekly AWS news (plus news from those other cloud providers too). You can also like us on Facebook, follow us on Twitter, or join the conversation on Discord!