In this lab, students will set up a Kinesis Data Stream to process streaming orders from customers actively making online orders. The streaming data will then be joined with data hosted in a DynamoDB table to enrich the data. Once the data is enriched, it will be delivered to another Kinesis Data Stream to be further processed by Kinesis Data Analytics. Using Kinesis Data Analytics, students will filter results and store them into S3 using Kinesis Data Firehose.
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Create a Kinesis Data Stream
Navigate to Kinesis in the AWS console and create two Kinesis Data Streams, one for the incoming grocery orders and another for the enriched orders.
- Create a Lambda Function to Enrich Records
Create a Lambda function that has a trigger from the incoming Kinesis Data Stream setup in the previous objective. Once the Lambda function is triggered, join the incoming records with the data from the
users-information
DynamoDB table by theuser_id
. The results from the Lambda function should output the results onto the second Kinesis Data Stream created in the previous objective.- Start Streaming Data
Using the Kinesis Live application deployed during the lab setup, start streaming orders into your incoming Kinesis Data Stream.
- Filter Streaming Data with Kinesis Data Analytics
Create a Kinesis Data Analytics application with the data source as the enriched records Kinesis Data Stream. Write a SQL query to filter the results and only return the orders that have a
total_cost
of $100 or more.- Create a Kinesis Data Firehose to Transform and Deliver the Final Results
Create a Kinesis Data Firehose delivery stream that will be the destination for the Kinesis Data Analytics application from the previous objective. Set up the Kinesis Firehouse delivery stream to transform the records before they are delivered to S3. You can do this by creating a Lambda function. The transformation is simple, just add a newline to each record. Finally, output the results into an S3 bucket and ensure the results are compressed.