Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Azure icon
Labs

Bulk Load Data into Cosmos DB for NoSQL

Bulk load refers to scenarios where you need to move a large volume of data, and you need to do it with as much throughput as possible. Workloads can be based on batch processes, such as nightly data loads, or based on streaming processes where you are receiving hundreds of thousands of documents that you need to update. In this hands-on lab, you will use the Cosmos DB SDK along with vanilla C# code to enable bulk execution on a CosmosClient class. Then you will generate synthetic data to test a bulk load of 1,000 JSON documents into Cosmos DB for NoSQL. Students with solid experience coding in .Net C# — and/or experience with the Cosmos DB for NoSQL SDK for any language — will be the most prepared to complete this lab without assistance. However, tips are provided for developers with less experience, visit the solution videos and the lab guide for full solutions.

Azure icon
Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 45m
Published
Clock icon Dec 16, 2022

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Housekeeping

    1. Open an incognito or in-private window and log in to the Azure portal using the user name and password provided in the lab environment.
    2. From within the portal, initiate the Cloud Shell to select Bash (versus PowerShell and set up with new backing storage.
    3. From the Bash command prompt, execute the git clonecommand using the URL provided in the Additional Information and Resources section of the lab, followed by DP420Labs to alias the downloaded folder to a friendly name.
    4. Once the project is downloaded use Cloud Editor to open the Program.cs file.
    5. From the Bash command prompt, change to the working directory cd DP420Labs/DP420/BulkLoad.

    NOTE: You are free to write the code for this lab in Visual Studio Code or another IDE, if you have experience in that environment. Just make sure you download the GitHub project file to ensure you have the right library references and using directives. Be aware that the lab guide and the solution video are based on working in the Cloud Shell editor, but it won't substantially change the code you write.

  2. Challenge

    Instantiating the CosmosClient Object

    1. Navigate to the Cosmos DB account that is already set up for you and copy the primary connection string to connect to Cosmos DB in your code.
    2. Navigate to Data Explorer and note the name of the database and container already deployed to your account. The partition key for the container is itemId. You will need this information later.
    3. Run a quick SQL query to confirm that the container is empty.
    4. In the main method of the Program.cs file, author the code required to connect to your Cosmos DB account. Operate on the database and container already set up in that account. When you instantiate the CosmosClient, you will also need to enable bulk execution.

    Tips:

    • You will need to instantiate a CosmosClient, a Database, and a Container using the connection string, database name, and container name you retrieved from the portal.
    • There may be abiguity when instantiating the Database object due to the Bogus library that also has a Documentclass, so you can use the fully qualified path: Microsoft.Azure.Cosmos.Database
    • You will need to use a CosmosClientOptions class in order to set AllowBulkExecution to true ,or you can optionally use a CosmosClientBuilder fluent class.
    • If you still need help after considering these tips, you can copy-paste the code from the lab guide and/or watch the solution video.

    NOTE: If you do copy/paste the code from the lab guide, be sure to save the connection string you copied from the portal, first, so that you do not have to go retrieve it again.

  3. Challenge

    Loading Synthetic Data

    You are not expected to write the data generation code from scratch. You can simply copy/paste the following code. However, do take a few minutes to study it, taking particular note of the property that generates 1000 records, which is about right, for our bulk load test; if you set it much higher, you are likely to receive a 429 throttling error.

    1. Inside the main method, following the Cosmos DB connection code, paste this code:
    var fruit = new[] {"apple", "peach", "lemon", "strawberry", "pear"};
    //get items from a source; we're using a fake data generator, here
    List<GenericItem> itemsToInsert = new Faker<GenericItem>()
        .RuleFor(i => i.id, f => Guid.NewGuid())
        //itemId is partition key
        .RuleFor(i => i.itemId, f => f.Random.Number(1, 10))
        .RuleFor(i => i.itemName, f=> f.PickRandom(fruit))
        .Generate(1000);
    
    1. Outside of the main method, paste this code that creates an item class for the data generator:
    
            public class GenericItem
            {
                public Guid id {get; set;}
                public string? itemName {get; set;}
                public int itemId {get; set;} 
            } 
    
  4. Challenge

    Executing the Code

    The benefit of using the SDK to batch up data for bulk load is that you do not have to write the batching and caching logic. The SDK takes care of that under the covers. You just need to write vanilla C# code to add the items to the container.

    1. In the previous objective, the code populates a List<GenericItem> object, called itemsToInsert, with synthetic JSON documents. In this objective, you need to write code that iterates over that list and asynchrously inserts the items into the Cosmos DB container.

    NOTE: Better yet, you can create another List, but this time a List<Task> object. Iterate over itemsToInsert and load up the List<Task> object with the tasks that perform the container insert. Then return a Task with what is expected by the Main method.

    1. After you have written the code, save the changes to Program.cs file. Then, build the code. Assuming it builds without error, run the code.

    2. Assuming the code runs successfully, go back to the Data Explorer to run the SQL query again. You should now see the documents in the container.

    Tips:

    • Create a new List<Task> object and use a foreach construct to loop over the ItemsToInsert list in order to build a list of tasks that insert items into the container.
    • Use the CreateItemAsync<GenericItem> member on the Containerobject, which you instantiated in the first code block, to add items to the container.
    • When inserting an item, you need a reference to the item and, optionally, the container partition key, which isn't required but is more efficient for the database engine. If you decide to include it, the partition key for the container is** itemId**.
    • Building a list of tasks does not actually execute the inserts to the container. To do the work defined in the tasks, use this syntax to return a Task, which is the data type expected by the main method: await Task.WhenAll([whatever you named your batch of tasks]); Remember: You don't have to worry about collecting up the documents into batches before inserting. The SDK code takes care of that for you.
    • If you still need help after considering these tips, you can copy/paste the code from the lab guide or watch the solution video.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans