Performing Real-Time Data Analysis with Kinesis

1 hour
  • 4 Learning Objectives

About this Hands-on Lab

Easily ingesting data from numerous sources and making timely decisions is becoming a critical and core capability for many businesses. In this lab, we provide hands-on experience using Kinesis Data Firehose to capture and load data streams into Amazon S3 and perform near real-time analysis on the stream with Kinesis Data Analytics.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create a Kinesis Data Firehose Stream
  1. Log in to the AWS Console and navigate to Kinesis Data Firehose.
  2. Create a delivery stream named captains-kfh that will send our space captain scores to a new S3 bucket that you will create.
    • To save time during the lab, set the buffer sizes to the minimum values so data gets flushed from the stream faster. In a real environment, you will need to tune these values based on what you’re doing with the data.
    • This lab isn’t focused on IAM, so an IAM role named FirehoseDeliveryRole (with some characters for uniqueness) has been provided for this stream. For an extra challenge, you can create your own role.
Send Data to the Stream
  1. Log in to the provided server using the credentials in the lab.
  2. View the send_captains_to_cloud.py script in your user’s home directory.
  3. Run the send_captains_to_cloud.py script using Python3 to generate and send data to Firehose. The generated data will be displayed in the terminal.
  4. Back in the AWS Console, monitor the Firehose stream to see data coming in.
    • This may take a minute to begin populating, so refresh a few times if you don’t see any data.
  5. Once you see data on the Console, go back to the server and stop the script.
  6. Pull the generated data from S3 onto the server, then inspect it. It should match what was printed in the terminal.
  7. Start the data generating script again so we have data coming into the stream.
Find the Average Captain Ratings
  1. Create a new Kinesis Data Analytics application using the data from the captains-kfh stream.
    • Again, an IAM role has been provided. Feel free to use this, or for the extra challenge, create a new role yourself.
  2. Using the SQL editor, create a query that will show the average rating and total rating of each captain per minute.
  3. Save and run the query.
  4. After about a minute, you will see the results of your query streaming in.
Find Anomalous Captain Ratings
  1. Using the SQL editor, create a query that will rank the incoming captain ratings by how anomalous the rating is, displaying the most anomalous values first.
  2. Save and run the query.
  3. After a few seconds, you will see the results of your query streaming in.

Additional Resources

Scenario

Our company, My Space Captain, runs a fan site for television and movies set in outer space. One of our most popular features allows users to rate their favorite space captains. Our boss has asked us to load user rating data into Amazon Simple Storage Service (S3) as a backup copy. The data ingestion needs to be reliable and we need a solution with little to no ongoing administration. Additionally, the solution needs to automatically scale to meet capacity. Kinesis Data Firehose is perfect for this situation.

Additionally, we've been asked to analyze the user ratings as they are streaming in so we can get a sense of which captains are most popular and identify any data anomalies. Eventually, our team will build capabilities to respond to customer data in real-time. Kinesis Data Analytics is a perfect pair for this task.

Lab Goals

  1. Create a Kinesis Data Firehose Stream
  2. Send Data to the Stream
  3. Find Averages of the Data Per Minute
  4. Find Anomalies in the Data

Helpful Resources The code for this lab is available on GitHub.

**Logging In to the Lab Environment((

To avoid issues with the lab, use a new Incognito or Private browser window to log in to the lab. This ensures that your personal account credentials, which may be active in your main window, are not used for the lab.

Log in to the AWS console using the account credientials provided with the lab. Please make sure you are in the us-east-1 (N. Virginia) region when in the AWS console.

Use the provided username and password to SSH into the server at the public IP.

ssh [email protected]_IP

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?