Easily ingesting data from numerous sources and making timely decisions is becoming a critical and core capability for many businesses. In this lab, we provide hands-on experience using Kinesis Data Firehose to capture and load data streams into Amazon S3 and perform near real-time analysis on the stream with Kinesis Data Analytics.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Create a Kinesis Data Firehose Stream
- Log in to the AWS Console and navigate to Kinesis Data Firehose.
- Create a delivery stream named
captains-kfh
that will send our space captain scores to a new S3 bucket that you will create.- To save time during the lab, set the buffer sizes to the minimum values so data gets flushed from the stream faster. In a real environment, you will need to tune these values based on what you’re doing with the data.
- This lab isn’t focused on IAM, so an IAM role named FirehoseDeliveryRole (with some characters for uniqueness) has been provided for this stream. For an extra challenge, you can create your own role.
- Send Data to the Stream
- Log in to the provided server using the credentials in the lab.
- View the
send_captains_to_cloud.py
script in your user’s home directory. - Run the
send_captains_to_cloud.py
script using Python3 to generate and send data to Firehose. The generated data will be displayed in the terminal. - Back in the AWS Console, monitor the Firehose stream to see data coming in.
- This may take a minute to begin populating, so refresh a few times if you don’t see any data.
- Once you see data on the Console, go back to the server and stop the script.
- Pull the generated data from S3 onto the server, then inspect it. It should match what was printed in the terminal.
- Start the data generating script again so we have data coming into the stream.
- Find the Average Captain Ratings
- Create a new Kinesis Data Analytics application using the data from the
captains-kfh
stream.- Again, an IAM role has been provided. Feel free to use this, or for the extra challenge, create a new role yourself.
- Using the SQL editor, create a query that will show the average rating and total rating of each captain per minute.
- Check the Amazon Kinesis Data Analytics SQL Reference documentation for help.
- An example query has been provided for you on GitHub.
- Save and run the query.
- After about a minute, you will see the results of your query streaming in.
- Create a new Kinesis Data Analytics application using the data from the
- Find Anomalous Captain Ratings
- Using the SQL editor, create a query that will rank the incoming captain ratings by how anomalous the rating is, displaying the most anomalous values first.
- Check the Random Cut Forest documentation for help.
- An example query has been provided for you on GitHub.
- Save and run the query.
- After a few seconds, you will see the results of your query streaming in.
- Using the SQL editor, create a query that will rank the incoming captain ratings by how anomalous the rating is, displaying the most anomalous values first.