Using Data Pipeline to Export DynamoDB Data to S3

30 minutes
  • 2 Learning Objectives

About this Hands-on Lab

In this lab, we are going to use Data Pipeline to copy DynamoDB data to an S3 bucket as a backup. We’ll learn different ways that this can be done with DynamoDB and Data Pipeline to create backups of the DynamoDB data.
**Note that this lab has been updated to reflect changes in AWS, the latest steps can be found in the lab guide. m4.large instance size must be used for the master and core instances.**

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Copy Subnet ID and S3 Bucket Name

Before we can create a data pipeline, we’ll need the name of the S3 bucket that we are going to output data to as well as a Subnet ID so pipeline will know where to launch the EMR cluster to handle executing the export.

To get the S3 bucket name, navigate to S3 in the AWS console and locate the provided S3 bucket. Copy the bucket name from the console (it should start with cfst-) and save this for the next objective.

Next, find the Subnet ID by navigating to the VPC console and find the subnet with an internet gateway attached to its route table that has a CIDR range of Copy the Subnet ID and save this for the next objective.

Create Data Pipeline

Navigate to the Data Pipeline console and create a new pipeline to export our data from the LinuxAcademy DynamoDB table to our S3 bucket. Make the below settings are set:

  1. The pipeline should be called backupdbtable.
  2. In the Build Using a Template field, use the Export DynamoDB Table to S3 template.
  3. The source table will be the LinuxAcademy DynamoDB Table that is already created.
  4. Logging should be set to the cfst- bucket from the first objective.
  5. The pipeline should be set to run on pipeline activation.

You should also set the following Architecture setting for the pipeline:

  1. Add in the Subnet ID parameter and use the Subnet ID from the first objective.
  2. Update the Core Instance Type to m4.large
  3. Update the Master Instance Type to m4.large
  4. Set the Resize Cluster Before Running parameter to false.

Once the parameters have been set, save and activate the pipeline to begin the export execution.

Additional Resources

How to Begin

Please go to the AWS Console using the link provided after lab creation is complete.

Log in using the credentials provided to you. You should have been given a user name of cloud_user and a randomly generated password.

Please make sure you are in the us-east-1 region before beginning.

Update both Core Instance Type and Master Instance Type to m4.large.

Other Uses

One more thing we can do is move data the other way from S3 back into DynamoDB. We're not going to actually do it, but just to get an idea of how it works, you can go back into the Data Pipeline console after completing this lab. Click List Pipelines up at the top, so we've got a fresh screen, and then click Create new pipeline. Take a look in the top section at the Source. If we select Build using a template, then click the Choose... dropdown, we can see Export DynamoDB table to S3 (what we chose last time we were here). But right below that is Import DynamoDB backup data from S3. Outstanding!

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?