|Topic||Machine Learning on AWS|
|Goal||Build a Netflix Style Recommendation Engine with Amazon SageMaker|
|Outcome||Gain real machine learning and AWS skills while getting hands-on with a real-world project to add to your portfolio|
|Deadline||December 31, 2020|
Have you ever wondered how Netflix recommends movies to you? I’ve always been curious about the machine learning techniques and algorithms used behind the scenes to help me navigate the thousands of movies found on Netflix.
In this challenge, you’ll level up your machine learning skills by building a Netflix-style recommendation engine using Amazon SageMaker. Whether you’re a machine learning first-timer or a machine learning guru, there are aspects of this challenge that will advance your skills to the next level. So, let’s go!
You’ll need to have access to an AWS environment and Amazon SageMaker. As a part of the AWS Free Tier, you can get started with Amazon SageMaker for free. If you have never used Amazon SageMaker before, you’ll have free access to build and train your model. If you’re not familiar with machine learning, check out the first video of Kesha’s Korner to come up to speed. I also recommend exploring matplotlib, scikit-learn and the k-means learning algorithm before starting this challenge.
1. Determine use case and obtain data
Determine what you’d like to recommend — be it movies, courses, videos, items, or something else. Then, find data or use your own. There are several public data repositories like the AWS Marketplace or the UCI Machine Learning Repository that may have the data you need. If you’d like to recommend movies, review the IMDb Datasets and download the files you need. You may find the title.akas.tsv.gz, title.basics.tsv.gz, and title.ratings.tsv.gz particularly useful.
2. Create Jupyter hosted notebook
To start the data inspection process, you’ll launch a Jupyter hosted notebook on Amazon SageMaker. You can use Python and various data science libraries like NumPy and Pandas’ DataFrame to work with your data.
3. Inspect and visualize data
It’s important to gain domain knowledge of your data so that you can easily detect anomalies and outliers. There are many ways to explore and get to know your data. Check out Matplotlib.
4. Prepare and transform data
The next step is to put the data in a format a machine can learn from. You may have to combine disjointed data files into one, remove null values, convert strings to numbers, or do a little feature engineering
Now that you’ve transformed the data, start the training process using your selected machine learning algorithm. The algorithm should cluster or group your data so that you’re able to make recommendations. Depending on how you’re solving this challenge, you may find k-means clustering useful. Amazon SageMaker provides a k-means clustering algorithm or you can explore scikit-learn’s version.
Now that you’ve identified your clusters, recommend the items. If you’re recommending movies, this could be Python code that analyzes the clusters to find commonalities. Once you understand the commonalities, you’re able to find other movies that are similar to recommend. Congratulations on making it this far!
7. Source control
Now that you’re finished, load your data files and Jupyter notebook to GitHub so that we can check out your recommendation engine.
8. Clean up resources
Don’t forget to clean up your resources! At a minimum, stop your Jupyter notebook from running so that you don’t incur hourly charges for using Amazon SageMaker.
9. Blog post
(very important) Write a short blog post explaining your learnings and your approach to the challenge. Link to your project on GitHub so we can review it.
When You’re Done
You can complete the project requirements by yourself or in collaboration with others. Feel free to ask questions in the discussion forum or on social media using the #CloudGuruChallenge hashtag!
When you finish all the steps of the project, post a link to your blog post in the designated forum thread. I will then be able to endorse you on LinkedIn for the skills you demonstrated in this project: machine learning, AWS, and Amazon SageMaker. (You’ll also be entered to win some cool swag!)
This challenge will remain available indefinitely, but to get endorsed on LinkedIn and win swag, you need to link your blog post on the forum by December 31, 2020.
Most importantly, the #CloudGuruChallenge is FREE and available to everyone: all you need is an ACG free-tier membership to make your forum posts.
Be prepared to do some Googling, but if you are an ACG member, here are some resources that can help you get more comfortable with machine learning, AWS, and SageMaker:
- Introduction to Machine Learning (6.5-hour course)
- Introduction to Jupyter Notebooks (AWS SageMaker) (1-hour hands-on lab)
- Introduction to Jupyter Notebooks (3-hour course)
- What is Amazon SageMaker (7-minute lesson from AWS Certified Machine Learning – Specialty (LA) Course)
You don’t need to perform these additional steps to “declare victory” on the challenge, but they will help your project stand out and provide awesome additional learning.
- Rank the items in order to recommend the most relevant items to the user
- Recommend only items the user hasn’t watched (or purchased)
- Integrate knowledge of your clusters into a front-end application to make movie (or product) recommendations.
This challenge will be a fun way to explore machine learning!
I’ve always wondered how Netflix recommended movies to me. Before creating this challenge for you, I solved it myself! If you’re a machine learning first-timer, this will be a lot of fun. And if you’re already a machine learning pro, you’ll have an opportunity to explore machine learning in the cloud using Amazon SageMaker. I can’t wait to share my recommendation engine code with you at the end of this challenge and review how you solved the problem!
Get through this, and you’ll have a great story to bring up in your next job interview. Good luck!