In the example of the maker lab with the twitter api, if we remove kinesis and go directly to the s3 bucket wouldnt it all still work? At the end of the day the bottleneck will still be S3 and how fast it can get the data in. In addition, if youre worried about S3 not being able to absorb the load would SQS help in that. It seems to be kinesis is just a glorified SQS queue that can read data in really fast but at the end of the day just a queue for big amounts of data.
You are correct that there are many many different ways to take in masses of data. The idea of many labs is to get students hands-on familiarity with certain services as was in the case of the Twitter Maker Lab. I could have used SQS, could have used Lambda directly, maybe API Gateway to SNS, S3 directly, RDS, DynamoDB directly, or even JSON files dropped on EBS volumes for that matter.
Kinesis is best for real-time streaming data scenarios…think IoT sensors or telemetry use cases where you need to continuously accept and process a stream of data with almost no latency. SQS is more appropriate for store-and-forward scenarios.
In the end, it’s really up to you as a Solutions Architect to decide which tool in the toolbox is most appropriate.
It’s worth remembering that the maker lab is a "contrived" example since it’s a demo. The point of Kinesis seems to be (please correct me if I’m wrong Scott) that you can get really high throughput from many data producers to many data consumers while also being able to analyze and transform the data as it streams through.
You can’t accomplish this with SQS alone, or S3 alone. Neither one supports stream processing or analytics. They are also very limited compared to Kinesis in the number of storage backends they can be configured to automatically ship the data off to (in fact, all you can really do is configure them to trigger Lambda functions). Also, SQS is a more traditional message queue in that it does not allow multiple consumers to see the same data at the same time. When one consumer pulls a message from the queue, it becomes invisible to other consumers for the duration of the visibility timeout. To fan that data out, you need to add SNS along with a separate queue for each consumer. With Kinesis, you can configure Firehose so that each item goes to multiple places "automatically."
Yep, David, you’re exactly right. Most labs aren’t designed to be "reference models" but rather their objective is to provide people an excuse to get hands-on.
In the end we decided to go with the managed Kafka service, Kinesis just seems to expensive for big throughputs
I can relate to this question, and not just for Kenisis. Honestly, it’s getting to a point where I feel like Amazon creates product offerings just for fun. There are soooo many overlaps in service and features in the product lines. Perhaps it’s getting more annoying than confusing!