AWS Certified Solutions Architect - Professional 2020

Sign Up Free or Log In to participate!

If DynamoDB uses a good hash funtion good entropy and bit independence the key we set wont matter anymore wont it?

In the lesson on dynamoDB and architecting to scale, it mentions that the date key matters. If we somehow we increase variability in the dates it will lead to more spread out keys when hashed. This is just simply untrue. Unless dynamodb uses a bad or non-cryptographic hash function. Even then if varying the data leads to more spread out keys youve got a terrible hash function on your hands.  one quality of a good and neccessary quality of a hash function is that it will spread the keys out evenly independent of the input. Is dynamoDB using a bad hash function to hash the partition keys or is the video inaccurate?

2 Answers

Perhaps this document on Write Sharding will help to explain.

"…in the case of a partition key that represents today’s date, you might choose a random number between 1 and 200 and concatenate it as a suffix to the date. This yields partition key values like 2014-07-09.1, 2014-07-09.2, and so on, through 2014-07-09.200. Because you are randomizing the partition key, the writes to the table on each day are spread evenly across multiple partitions. This results in better parallelism and higher overall throughput."

jia chen

Hi ben! thanks for youre reply!! i still cant seem to wrap my head around it! ^ this would make alot more sense if the keys arent hashed than by introducing varability in the keys it would lead to more spread out data but the fact that its being hash then makes it super counter intuitive to me.

jia chen

unless they do something like base 64 the key and take the last n bits which in that case is a case study on a bad hashing algo.

Jia, I think I see your confusion. What they are saying is not that "more variable input creates more variable hashes" but are saying "static input creates the same hash". Imagine every record created on the same day will have the same date (e.g. 2019-06-27) and this date will always produce the same hash and so always use the same partition. As you point out, as the date changes the hashing algorithm will properly and evenly distribute the records across the partitions but during the same day all the records will be in the same partition because the date is static. This is why Ben mentioned adding a random component to the date to cause the hash to vary.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?