In the lesson on dynamoDB and architecting to scale, it mentions that the date key matters. If we somehow we increase variability in the dates it will lead to more spread out keys when hashed. This is just simply untrue. Unless dynamodb uses a bad or non-cryptographic hash function. Even then if varying the data leads to more spread out keys youve got a terrible hash function on your hands. one quality of a good and neccessary quality of a hash function is that it will spread the keys out evenly independent of the input. Is dynamoDB using a bad hash function to hash the partition keys or is the video inaccurate?
Perhaps this document on Write Sharding will help to explain.
"…in the case of a partition key that represents today’s date, you might choose a random number between 1 and 200 and concatenate it as a suffix to the date. This yields partition key values like 2014-07-09.1, 2014-07-09.2, and so on, through 2014-07-09.200. Because you are randomizing the partition key, the writes to the table on each day are spread evenly across multiple partitions. This results in better parallelism and higher overall throughput."
Jia, I think I see your confusion. What they are saying is not that "more variable input creates more variable hashes" but are saying "static input creates the same hash". Imagine every record created on the same day will have the same date (e.g. 2019-06-27) and this date will always produce the same hash and so always use the same partition. As you point out, as the date changes the hashing algorithm will properly and evenly distribute the records across the partitions but during the same day all the records will be in the same partition because the date is static. This is why Ben mentioned adding a random component to the date to cause the hash to vary.