Share on facebook
Share on twitter
Share on linkedin

#CloudGuruChallenge – Machine Learning on AWS

Kesha Williams
Kesha Williams

What’s the #CloudGuruChallenge? Get some background and more info here.

TopicMachine Learning on AWS
CreatorKesha Williams
GoalBuild a Netflix Style Recommendation Engine with Amazon SageMaker
OutcomeGain real machine learning and AWS skills while getting hands-on with a real-world project to add to your portfolio
DeadlineDecember 31, 2020

Challenge Steps

Have you ever wondered how Netflix recommends movies to you? I’ve always been curious about the machine learning techniques and algorithms used behind the scenes to help me navigate the thousands of movies found on Netflix.

In this challenge, you’ll level up your machine learning skills by building a Netflix-style recommendation engine using Amazon SageMaker. Whether you’re a machine learning first-timer or a machine learning guru, there are aspects of this challenge that will advance your skills to the next level. So, let’s go!

You’ll need to have access to an AWS environment and Amazon SageMaker. As a part of the AWS Free Tier, you can get started with Amazon SageMaker for free. If you have never used Amazon SageMaker before, you’ll have free access to build and train your model. If you’re not familiar with machine learning, check out the first video of Kesha’s Korner to come up to speed. I also recommend exploring matplotlib, scikit-learn and the k-means learning algorithm before starting this challenge.

1. Determine use case and obtain data

Determine what you’d like to recommend — be it movies, courses, videos, items, or something else. Then, find data or use your own. There are several public data repositories like the AWS Marketplace or the UCI Machine Learning Repository that may have the data you need. If you’d like to recommend movies, review the IMDb Datasets and download the files you need. You may find the title.akas.tsv.gz, title.basics.tsv.gz, and title.ratings.tsv.gz particularly useful.

2. Create Jupyter hosted notebook

To start the data inspection process, you’ll launch a Jupyter hosted notebook on Amazon SageMaker. You can use Python and various data science libraries like NumPy and Pandas’ DataFrame to work with your data.

3. Inspect and visualize data

It’s important to gain domain knowledge of your data so that you can easily detect anomalies and outliers. There are many ways to explore and get to know your data. Check out Matplotlib.

4. Prepare and transform data

The next step is to put the data in a format a machine can learn from. You may have to combine disjointed data files into one, remove null values, convert strings to numbers, or do a little feature engineering

5. Train

Now that you’ve transformed the data, start the training process using your selected machine learning algorithm. The algorithm should cluster or group your data so that you’re able to make recommendations. Depending on how you’re solving this challenge, you may find k-means clustering useful. Amazon SageMaker provides a k-means clustering algorithm or you can explore scikit-learn’s version.

6. Recommend

Now that you’ve identified your clusters, recommend the items. If you’re recommending movies, this could be Python code that analyzes the clusters to find commonalities. Once you understand the commonalities, you’re able to find other movies that are similar to recommend. Congratulations on making it this far!

7. Source control

Now that you’re finished, load your data files and Jupyter notebook to GitHub so that we can check out your recommendation engine. 

8. Clean up resources

Don’t forget to clean up your resources! At a minimum, stop your Jupyter notebook from running so that you don’t incur hourly charges for using Amazon SageMaker.

9. Blog post

(very important) Write a short blog post explaining your learnings and your approach to the challenge. Link to your project on GitHub so we can review it. 

When You’re Done

You can complete the project requirements by yourself or in collaboration with others. Feel free to ask questions in the discussion forum or on social media using the #CloudGuruChallenge hashtag!

When you finish all the steps of the project, post a link to your blog post in the designated forum thread. I will then be able to endorse you on LinkedIn for the skills you demonstrated in this project: machine learning, AWS, and Amazon SageMaker. (You’ll also be entered to win some cool swag!)

This challenge will remain available indefinitely, but to get endorsed on LinkedIn and win swag, you need to link your blog post on the forum by December 31, 2020.

Most importantly, the #CloudGuruChallenge is FREE and available to everyone: all you need is an ACG free-tier membership to make your forum posts.

Resources

Be prepared to do some Googling, but if you are an ACG member, here are some resources that can help you get more comfortable with machine learning, AWS, and SageMaker:

Extra-Challenging Steps

You don’t need to perform these additional steps to “declare victory” on the challenge, but they will help your project stand out and provide awesome additional learning.

  1. Rank the items in order to recommend the most relevant items to the user
  2. Recommend only items the user hasn’t watched (or purchased)
  3. Integrate knowledge of your clusters into a front-end application to make movie (or product) recommendations.

Final Takeaways

This challenge will be a fun way to explore machine learning!

I’ve always wondered how Netflix recommended movies to me. Before creating this challenge for you, I solved it myself! If you’re a machine learning first-timer, this will be a lot of fun. And if you’re already a machine learning pro, you’ll have an opportunity to explore machine learning in the cloud using Amazon SageMaker. I can’t wait to share my recommendation engine code with you at the end of this challenge and review how you solved the problem!

Get through this, and you’ll have a great story to bring up in your next job interview. Good luck!

Recommended

Get more insights, news, and assorted awesomeness around all things cloud learning.

Get Started
Who’s going to be learning?
Sign In
Welcome Back!
Thanks for reaching out!

You’ll hear from us shortly. In the meantime, why not check out what our customers have to say about ACG?

How many seats do you need?

  • $499 $399 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$1,995.00

Checkout