Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Using Data Pipeline to Export DynamoDB Data to S3

In this lab, we are going to use Data Pipeline to copy DynamoDB data to an S3 bucket as a backup. We'll learn different ways that this can be done with DynamoDB and Data Pipeline to create backups of the DynamoDB data. **Note that this lab has been updated to reflect changes in AWS, the latest steps can be found in the lab guide. m4.large instance size must be used for the master and core instances.**

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 30m
Published
Clock icon Nov 06, 2020

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Copy Subnet ID and S3 Bucket Name

    Before we can create a data pipeline, we'll need the name of the S3 bucket that we are going to output data to as well as a Subnet ID so pipeline will know where to launch the EMR cluster to handle executing the export.

    To get the S3 bucket name, navigate to S3 in the AWS console and locate the provided S3 bucket. Copy the bucket name from the console (it should start with cfst-) and save this for the next objective.

    Next, find the Subnet ID by navigating to the VPC console and find the subnet with an internet gateway attached to its route table that has a CIDR range of 10.0.0.0/24. Copy the Subnet ID and save this for the next objective.

  2. Challenge

    Create Data Pipeline

    Navigate to the Data Pipeline console and create a new pipeline to export our data from the LinuxAcademy DynamoDB table to our S3 bucket. Make the below settings are set:

    1. The pipeline should be called backupdbtable.
    2. In the Build Using a Template field, use the Export DynamoDB Table to S3 template.
    3. The source table will be the LinuxAcademy DynamoDB Table that is already created.
    4. Logging should be set to the cfst- bucket from the first objective.
    5. The pipeline should be set to run on pipeline activation.

    You should also set the following Architecture setting for the pipeline:

    1. Add in the Subnet ID parameter and use the Subnet ID from the first objective.
    2. Update the Core Instance Type to m4.large
    3. Update the Master Instance Type to m4.large
    4. Set the Resize Cluster Before Running parameter to false.

    Once the parameters have been set, save and activate the pipeline to begin the export execution.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans