Easily ingesting data from numerous sources and making timely decisions is becoming a critical and core capability for many businesses. In this lab, we provide hands-on experience using Amazon Kinesis Data Firehose to capture, transform, and load data streams into Amazon S3 and perform near real-time analytics with Amazon Kinesis Data Analytics.
### Lab Prerequisites
– Understand how to log into and use the AWS Management Console.
– Understand AWS Elastic Compute Cloud (EC2) basics.
– Understand AWS Command Line Interface (CLI) basics.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Create a Kinesis Data Firehose Delivery Stream
- Log in to the AWS Management Console with the AWS Account information provided in the lab instructions.
- Navigate to the Kinesis service.
- Click Get started.
- Click Create Delivery Stream to create a Kinesis Data Firehose stream.
- Enter "captains-kfh" as the Delivery stream name.
- Click Next.
- Click Next.
- Select Amazon S3 as the destination.
- For the S3 Bucket, click Create New.
- Enter a globally unique bucket name, starting with "kfh-ml".
- Click Create S3 Bucket -> Click Next.
- Enter "1" MB as the Buffer size -> Enter "60" seconds as the Buffer interval.
- Click Create new or choose for the IAM role.
- Select the IAM Role provided in the lab.
- Select the FirehoseDeliveryRole as the Policy Name.
- Click Allow.
- Click Next .
- Click Create delivery stream.
- Stream Data to the New Kinesis Data Firehose Delivery Stream
- Open an SSH connection to the EC2 instance named
Kinesis Test Server
using the credentials provided in your lab instructions. - Run the following command.
python write-to-kinesis-firehose-space-captains.py
- Return to the AWS Management Console and navigate to the S3 bucket created earlier.
- Refresh the view on the S3 bucket every 30 seconds and wait for records to appear in the S3 bucket. It may take 60 seconds for them to show.
- Copy the name of the S3 bucket to the clipboard.
- Return to the terminal.
- Stop the Python script by pressing CTRL+C.
- Copy the files from S3 to the server by running the following command.
- Verify the contents of one of the files.
- Open an SSH connection to the EC2 instance named
- Create a Kinesis Data Analytics Application
- Navigate back to the terminal. If the ssh session was terminated, log back in.
- Run the following command.
python write-to-kinesis-firehose-space-captains.py
- Navigate to the Kinesis service in the AWS Management Console.
- Click Data Analytics on the left-side menu.
- Click Create application.
- Enter "popular-space-captains" as the Application name.
- Enter "popular-space-captains" as the Description.
- Ensure the SQL runtime is selected.
- Click Create application.
- Click Connect streaming data.
- Ensure Choose source is selected at the top.
- Select Kinesis Firehose delivery stream as the Source.
- Choose the
captains-kfh
created earlier as the Kinesis Firehose delivery stream. - Click Choose from IAM roles that Kinesis Analytics can assume.
- Choose the IAM role created for this lab.
- Click Discover schema.
- Click Save and continue.
- Click Go to SQL editor.
- Click Yes, start application.
- Open the “Using Kinesis Data Firehose and Kinesis Data Analytics Lab” GitHub repo provided in the lab instructions.
- Copy the SQL code from the
kinesis-analytics-popular-captains.sql
file and paste it into the SQL editor. - Click Save and run SQL.
- View the real-time analytics
DESTINATION_CAPTAINS_SCORES
results.
- Create a Kinesis Data Analytics Anomaly Detection Application
- Open the “Using Kinesis Data Firehose and Kinesis Data Analytics Lab” GitHub repo provided in the lab instructions.
- Copy the SQL code from the
kinesis-analytics-rating-anomaly.sql
file and paste it into the SQL editor. - Click Save and run SQL.
- View the real-time analytics
DESTINATION_SQL_STREAM
results.