2 Answers
Hi Jia,
You are correct that there are many many different ways to take in masses of data. The idea of many labs is to get students hands-on familiarity with certain services as was in the case of the Twitter Maker Lab. I could have used SQS, could have used Lambda directly, maybe API Gateway to SNS, S3 directly, RDS, DynamoDB directly, or even JSON files dropped on EBS volumes for that matter.
Kinesis is best for real-time streaming data scenarios…think IoT sensors or telemetry use cases where you need to continuously accept and process a stream of data with almost no latency. SQS is more appropriate for store-and-forward scenarios.
In the end, it’s really up to you as a Solutions Architect to decide which tool in the toolbox is most appropriate.
–Scott
It’s worth remembering that the maker lab is a "contrived" example since it’s a demo. The point of Kinesis seems to be (please correct me if I’m wrong Scott) that you can get really high throughput from many data producers to many data consumers while also being able to analyze and transform the data as it streams through.
You can’t accomplish this with SQS alone, or S3 alone. Neither one supports stream processing or analytics. They are also very limited compared to Kinesis in the number of storage backends they can be configured to automatically ship the data off to (in fact, all you can really do is configure them to trigger Lambda functions). Also, SQS is a more traditional message queue in that it does not allow multiple consumers to see the same data at the same time. When one consumer pulls a message from the queue, it becomes invisible to other consumers for the duration of the visibility timeout. To fan that data out, you need to add SNS along with a separate queue for each consumer. With Kinesis, you can configure Firehose so that each item goes to multiple places "automatically."
Yep, David, you’re exactly right. Most labs aren’t designed to be "reference models" but rather their objective is to provide people an excuse to get hands-on.
In the end we decided to go with the managed Kafka service, Kinesis just seems to expensive for big throughputs
I can relate to this question, and not just for Kenisis. Honestly, it’s getting to a point where I feel like Amazon creates product offerings just for fun. There are soooo many overlaps in service and features in the product lines. Perhaps it’s getting more annoying than confusing!