Optimize Query Performance in Cosmos DB for NoSQL

30 minutes
  • 3 Learning Objectives

About this Hands-on Lab

In this lab, you will be presented with a short series of real-world scenarios that describe performance issues in a Cosmos DB implementation. For each scenario, you will be provided with a list of optimization ideas to research and consider. Some of the suggested optimizations include adjusting throughput on a container, customizing or adjusting indexing policies, Time to Live (TTL) settings, and consistency levels. After researching and selecting the optimization idea most likely to introduce improvement, you will use the Azure portal to apply the solution.

No coding experience is required, but familiarity working with Azure Cosmos DB for NoSQL databases, and navigating the Azure portal will provide you with a headstart. However, novices who are willing to do a little research using the links provided, along with the lab guide and solution videos, should also be able to complete the lab successfully.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Scenario 1: Optimize for Inconsistent Request Traffic

Your first consultation is with a team the works with the ADT (admissions, discharges, and transfers) data for a hospital administrative application. ADT messages arrive in the Messages container on the ADT database. The team is plagued by 429 errors and clear evidence of throttling during high-traffic times that coincide with elective surgery check-ins early in the morning, and late evening admissions in emergency rooms. Team members have tried setting the container throughput to 10,000 RU/s to prevent throttling during high-traffic periods, but more than half the time, the container is consuming only about 4,000 RU/s. The team lead knows they are underutilizing the higher throughput setting and wasting a lot of money, but she is not sure how to address it. Given the optimization ideas below, what do you recommend? You may want to navigate to the Cosmos DB account, database, and container to check current settings before you settle on any solutions.

After selecting one or more optimization ideas, navigate to the appropriate resource(s) and apply your solution(s).

Optimization Ideas

  • Modify throughput on a container
  • Modify the indexing policy on a container
  • Modify TTL on a container
  • Modify the consistency level on the account
Scenario 2: Optimize Common Read Queries

Your second meeting is with a data engineer, who supports the data analytics team. Users are complaining of extremely slow performance for nearly all query executions. The analytics team works with data on the Analytics database, in a container called PrimaryCareProviders.

The data engineer has identified the fact that nearly all of the queries are identical, except that the users return different properties in different forms, depending on the report or visualization they are populating. Specifically, the WHERE clause is very common:

SELECT
c.last_name
,c.gender
,c.age
,c.city
,c.pcp_id
FROM c
WHERE c.pcp_id = [some value]

The engineer has also identified a few queries that fail entirely if they include this ORDER BY clause:
ORDER BY c.age

What can you suggest? You may want to navigate to the Cosmos DB account, database, and container to check current settings before you settle on any solutions.

After selecting one or more optimization ideas, navigate to the appropriate resource(s) and apply your solution(s).

Optimization Ideas

  • Modify throughput on a container
  • Modify the indexing policy on a container
  • Modify TTL on a container
  • Modify the consistency level on the account
Scenario 3: Reduce Storage Costs

Your third consultation is with the Internet of Things (IoT) team, who manage hospital sensors, which stream device messages to a container on Cosmos DB for NoSQL. The messages are first stored in a container called DeviceIntake, and then an Azure Function with a trigger on the DeviceIntake change feed sends the data along to Event Hubs. An Azure Stream Analytics job pulls from Event Hubs to transform and augment the data before sending it along to another Cosmos DB container called DeviceMessages. Both containers are on a database called IoT_Team. The architecture is working well, which is somewhat surprising, given that the team spun up both containers with just the normal default settings and have made no adjustments to those defaults.

No clients read from the DeviceIntake container; once an audit process runs every night to ensure that all raw messages in DeviceIntake have safely arrived in DeviceMessages, no other read operations are performed on that container. All read operations are executed against DeviceMessages, and the typical access pattern involves point reads; SQL queries are rare.

There are no reported performance issues, but upper management has seem some cost management reports from the IT manager, and they are very concerned by how rapidly the storage costs are racking up. Given that the cost is high enough to get the attention of upper management, the team is eager to hear your ideas for quickly addressing storage costs.

You may want to navigate to the Cosmos DB account, database, and containers to check current settings before you settle on any solutions.

After selecting one or more optimization ideas, navigate to the appropriate resource(s) and apply your solution(s).

Optimization Ideas

  • Modify throughput on a container
  • Modify the indexing policy on a container
  • Modify TTL on a container
  • Modify the consistency level on the account

Additional Resources

Imagine you are a consultant, and you've been brought into a start-up healthcare software company to help tune their Cosmos DB for NoSQL implementation. They are Cosmos novices, so their issues are not complicated, but solutions are urgently needed. They are looking for your advice on some quick-fix solutions. You will meet with three different teams, who each have a troublesome scenario.

Before tackling this first objective, be sure that you have logged in to the Azure portal and that you can see the Azure Cosmos DB for NoSQL account deployed for you in the resource group. Navigate to the account and check that you have three databases: ADT (with one container), Analytics (with one container), and IoT_Team (with two containers).

For all three scenarios, you will select from the four general optimizations below. Each optimization idea may be applied in more than one scenario — or not at all. In one case (we don't tell you which one), the optimal adjustment involves two of the optimization ideas.

Optimization Ideas

  • Modify throughput on a container
  • Modify the indexing policy on a container
  • Modify TTL on a container
  • Modify the consistency level on the account

Data and Testing

There is no data in any of the containers deployed in the lab, so you will not be able to test your ideas. Of course, in a live environment, you would apply and test your solutions in a test account, with a non-trivial amount of data; however, that is not a requirement to complete the lab objectives.

Research Links

Indexing Policies in Azure Cosmos DB

Provision Standard Throughput

Provision Autoscale Throughput

Time to Live (TTL) in Azure Cosmos DB

Consistency Levels in Azure Csomos DB

Troubleshoot Request Rate Too Large

Troubleshoot Query Performance

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?