The explanation changed the partition key from using Date to Sensor. This removed the hot partition as one days readings was then no longer stored in a single partition. The request was then ‘give me all the sensor readings for 2018-01-01’ which was spread over all the partitions.
My question is asking the correctness of this.
In DynamoDB you can do either Scan or Query operations. Scans are expensive as you scan all items. Queries are better as they just hit the partition you are interested in and with a sort key you narrow down to the results you want.
With the ‘give me all the sensor readings for 2018-01-01’ request, it seems like you don’t know all the sensor ids so it’s a scan, which will use up capacity equal to the size of the table?
It might be you do have the list of all the sensors, but then that’s N queries of ‘give me all the sensor readings for 2018-01-01 for sensor id #1’ then repeat for #2, #3, and so on. (Moving complexity into the coding space)
Please correct me if I’m wrong.
If I am correct, there is then a new question of how should such queries be done?
Due to the way DynamoDB works, such aggregation queries are a bit painful.
It can be we pre-compute an aggregation and store this in DynamoDB so it is a single item read. This can work by using DynamoDB streams to trigger a lambda to do the aggregation. It might also be that rather than an aggregation on all devices we group devices by user or region. With this we might then have the following pk and sk
pk=USER sk=READING#DATE#SENSOR value=…
pk=USER sk=SUMMARY#DATE values=…
E.g.
My sensors 1 and 2 make a reading each would result in the following dynamodb actions
pk=Andy sk=READING#2018-01-01#Sen1 value=1 (Single write, causes DynamoDB stream trigger causing..)
pk=Andy sk=SUMMARY#2018-01-01 values=[1]
Later sensor 2 writes it’s data
pk=Andy sk=READING#2018-01-01#Sen2 value=3. (again this write triggers update of ..)
pk=Andy sk=SUMMARY#2018-01-01 values=[1,3]
There are then 3 items in the table, 2 readings and 1 summary.
We can then ask the question ‘give me all ‘Andy’s sensor readings for 2018-01-01’, pk of ‘Andy’, sk of ‘SUMMARY#2018-01-01’.
In the DynamoDB streams trigger lambda the query is possible with a pk of ‘Andy’ and a range between ‘READING#2018-01-01#R’ and ‘READING#2018-01-01#T’ and will pick up all of Andy’s sensor’s readings.
This does make a hot, or mildly warm, partition for ‘Andy’. However, we think there may be other groups of sensors for ‘Bob’ and ‘Alice’ too.
Sorry for the long post. I’m am enjoying the course. Keep up the good work.