In this lab, we are going to use Data Pipeline to copy DynamoDB data to an S3 bucket as a backup. We’ll learn different ways that this can be done with DynamoDB and Data Pipeline to create backups of the DynamoDB data.
**Note that this lab has been updated to reflect changes in AWS, the latest steps can be found in the lab guide. m4.large instance size must be used for the master and core instances.**
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Copy Subnet ID and S3 Bucket Name
Before we can create a data pipeline, we’ll need the name of the S3 bucket that we are going to output data to as well as a Subnet ID so pipeline will know where to launch the EMR cluster to handle executing the export.
To get the S3 bucket name, navigate to S3 in the AWS console and locate the provided S3 bucket. Copy the bucket name from the console (it should start with cfst-) and save this for the next objective.
Next, find the Subnet ID by navigating to the VPC console and find the subnet with an internet gateway attached to its route table that has a CIDR range of
10.0.0.0/24
. Copy the Subnet ID and save this for the next objective.- Create Data Pipeline
Navigate to the Data Pipeline console and create a new pipeline to export our data from the LinuxAcademy DynamoDB table to our S3 bucket. Make the below settings are set:
- The pipeline should be called backupdbtable.
- In the Build Using a Template field, use the Export DynamoDB Table to S3 template.
- The source table will be the LinuxAcademy DynamoDB Table that is already created.
- Logging should be set to the cfst- bucket from the first objective.
- The pipeline should be set to run on pipeline activation.
You should also set the following Architecture setting for the pipeline:
- Add in the Subnet ID parameter and use the Subnet ID from the first objective.
- Update the Core Instance Type to m4.large
- Update the Master Instance Type to m4.large
- Set the Resize Cluster Before Running parameter to false.
Once the parameters have been set, save and activate the pipeline to begin the export execution.