When you create tables in DynamoDB, you must specify primary key attributes. These primary key attributes can then be used to retrieve data from your tables.To more efficiently find your data, DynamoDB creates indexes for those attributes. Sometimes, however, you may need to query data using an attribute that’s not in one of your primary keys. This is where secondary indexes can help.
In the example above, we have a Reply table for our forums. The Id is our partition key, ReplyDateTime our sort key, and these two attributes make up our primary key. This primary key can be used to query data.Say we want all of the Amazon DynamoDB#DynamoDB Thread 2 threads. We can query for that, and DynamoDB, in turn, gives us 3 items that match this query. Since we have ReplyDateTime as a sort key, we could also ask for replies after a specified date.But what if we wanted to query by users who posted replies? We can’t use the PostedBy attribute to pull up all of User A’s messages because it is not a key attribute.This is where secondary indexes come into play. Depending on the secondary index that we use, we could set PostedBy as another sort key (in addition to the ReplyDateTime sort key) or even set PostedBy as a partition key! Both of these options give us the ability to filter messages by user.We can create these secondary indexes and only project some of the table’s attributes. For example, say you want to have a panel that shows all activity for the Amazon DynamoDB forum, including Id, ReplyDateTime, and PostedBy, but you don’t need to see the messages in this panel. Instead of creating a secondary index that just duplicates the Reply table, we can choose only to project those 3 attributes and not the Message attribute.Since the Message attribute could potentially hold the most information (they’re user-submitted messages), by not projecting it to our secondary index we can reduce the time it takes to pull information, and thus reduce the amount of read capacity needed to query for that panel, benefiting our performance/cost ratio.Great! So now we know why we need secondary indexes, but which secondary indexes can we use? What are the options, and how are they different?
Secondary Indexes: Local and Global
There are two types of secondary indexes, local and global, and they have slightly different characteristics.
Local Secondary Indexes
These are probably the easiest to understand because they share their table’s partition key, but give us the option to have more sort keys. In fact, local indexes give us the option to have up to 5 more sort keys because you can create up to 5 local secondary indexes. This is in addition to the sort key you already have on the table, for a total of 6 sort keys.Be aware that local secondary indexes share provisioned throughput (read/write capacity) with their parent table. We need to understand this when allocating read and write capacity on tables. For more information regarding throughput, it is covered extensively in our courses.Local indexes must be created when we create the table. We cannot add them after the table is created, nor can we delete them! Plan this out carefully.
Global Secondary Indexes
Global indexes have a few major differences compared to local indexes:
- They can be added on to existing tables
- They have their own provisioned throughput
- They can have different partition and sort keys from the parent table
The first difference can give us a lot of flexibility. Sometimes our needs change as our data or traffic grows, and having the ability to add indexes as we need them is a big bonus.The second difference can completely change how you calculate the necessary read and write capacity units for a table and index, and can also make a difference in cost.The third difference again gives us greater flexibility. Whereas with local secondary indexes we had to have a composite key (partition key + sort key), and we had to use the same partition key, global indexes completely change that. We can have a simple primary key (just a partition key) or a composite key, and they can be completely different from that of the table’s keys. That’s why they’re called global — because queries on the index can span all of the data in a table, across all partitions.
Can I Always Choose Global Secondary Indexes?
After reading this, it might sound like global secondary indexes are always the best choice. And while they do offer more flexibility, that comes at a price. Since global indexes have their own read and write capacity, the price is one of cost.On top of the dent in your wallet, global indexes have another downside. They only support eventually consistent reads. While this can be a good thing since eventually consistent reads use only half as many read capacity units, it could mean that you will sometimes receive stale data. Amazon claims that the data update will usually propagate in under a second; however, where is no guarantee of this and your application must be built with eventual consistency concerns in mind.
We’ve taken a look at some of the most significant differences between the two available secondary indexes for DynamoDB. We’ve also explained when secondary indexes can be useful, and when you should choose one secondary index over the other. But what should you learn now?If you are not familiar with the DynamoDB concepts of tables, primary keys, attributes, and operations you can perform on tables, I recommend backtracking just a little bit before going forward by reviewing DynamoDB concepts.If you are familiar with those concepts, I would recommend moving forward and learning more about read/write capacities and how they affect both tables and secondary indexes. Those are critical to understanding DynamoDB.If you’re a member, we get more in-depth looks at DynamoDB in the AWS Certified Developer – Associate training course regarding both the secondary indexes and provisioned throughput calculations. We also cover the basics so you don’t have to have any experience with DynamoDB to succeed in this self-paced course.Of course, you can also check out the AWS docs:
- Provisioned Throughput Intro
- Throughput Considerations for Global Secondary Indexes
- Throughput Considerations for Local Secondary Indexes