Loading Data into a Redshift Cluster

1 hour
  • 3 Learning Objectives

About this Hands-on Lab

In this lab, you will load data from your choice of S3 bucket, or DyanmoDB table into a running Redshift cluster. You will need to create an IAM role that provides access to the S3 bucket or DynamoDB table and apply it to the Redshift cluster. Some experience with either S3 or DynamoDB will be useful, but we walk through loading data into each of these services.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Prepare the Source Data
  1. Visit this lab’s content repo and download the redshift-data.csv file to your local machine.
  2. Log in to the AWS Management Console with the credentials provided, search and select the S3 service.
  3. Click create, and name it redshift-import20200923. Select Create once again.
  4. After creation, click the bucket’s name.
  5. Select the Upload and Add files buttons and choose the redshift-data.csv file that you downloaded in step 1. Afterward, select Upload.
  6. Click on Services to find and select the DynamoDB service.
  7. Select Create table:

    • Table name: redshift-import
    • Primary key: ID (select Number in the drop-down menu)
    • Click Create
  8. Generate an Access Key by clicking on your username, and choose My Security Credentials.
  9. Under Access key, choose Create Access Key and copy the Access key ID and Secret access key. Keep this, you will need it momentarily.
  10. Switch to the bastion host’s (cloud server) terminal and download this lab’s content repository.
    git clone https://github.com/linuxacademy/content-aws-database-specialty.git
    cd content-aws-database-specialty/S06_Additional Database Services/
  11. Configure the AWS CLI client and paste in the Access key ID and Secret access key:

    aws configure
  12. Choose the Default region name: us-east-1 and leave the Default output format to None by pressing the return key.
  13. Load the .json into the DynamoDB table:

    aws dynamodb batch-write-item --request-items file://redshift-data.json

    "UnprocessedItems" will appear if successful.

  14. Return to the AWS management console, then choose the DynamoDB service.
  15. Click on Tables, redshift-import, and the items tab to view the items imported from the JSON file.
Create IAM Role
  1. In the AWS Management Console search and select for the IAM service.

  2. Select Roles >> Create role >> EC2 >> Next Permissions.

  3. In the search bar type S3 and select AmazonS3ReaOnlyAccess. Additionally, search for DynamoDB and select AmazonDynamoDBReadOnlyAccess.

  4. Choose Next:Tags and type in name (for the key) and redshift-import (for the value). Select Next:Review.

  5. On this Create role page:

    • Role name: redshift-import
    • Select Create Role.
  6. Search and click on redshift-import, choose Trust relationships, and Edit trust relationship. Change Service: section to Service: "redshift.amazonaws.com".

  7. Select Update trust policy.

  8. In the AWS Management Console search and select for the Redshift service.

  9. Select 1 link under the Cluster.

  10. Choose this cluster, click Actions, and Manage IAM roles.

  11. From the drop-down choose redshift-import, Add IAM role, and Done.

Load the Data
  1. In the AWS Management Console search and select for the Redshift service.

  2. Click on 1, the cluster name, and copy the Endpoint name.

  3. Select Services and search for and select the IAM service, Roles, redshift-import, and copy the Role ARN. This will be used momentarily.

  4. Switch to the bastion host’s (cloud server) terminal and save the cluster’s import name into a variable. Be sure to erase everything after the .com in the endpoint (port number and import-test):

    export PGHOST=<RedshiftEndpoint>
  5. Connect to the cluster and enter in the password, provided in this lab’s description:

    psql -U masteruser -p 5439 import-test
  6. Create the tables:

    create table s3_users (ID int, Name varchar, Department varchar, ExpenseCode int);   
    create table dynamodb_users (ID int, Name varchar, Department varchar, ExpenseCode int);   
  7. Import the data from DynamoDB. Be sure to replace the IAM ARN with yours.:

    copy dynamodb_users from 
    'dynamodb://redshift-import' iam_role 'paste in IAM Role ARN' readratio 50;
  8. Query the table to ensure rows were inserted:

    select * from dynamodb_users ;
  9. Import data from the S3 bucket:

    copy s3_users from 's3://redshift-import20200923/redshift-data.csv' iam_role 'paste in IAM Role ARN' delimiter ',';
  10. Query the table to ensure rows were inserted:

    select * from s3_users ;

Additional Resources

In this lab, you work as a Database Administrator and manage your company's Redshift cluster. The Development team has requested a way to import data into the Redshift cluster from either an S3 bucket or DynamoDB table. They have provided you with a small sample of the data to be imported from the S3 bucket or DynamoDB table into Redshift.

Note: Use the following credentials to access the Redshift cluster:

  • User: masteruser
  • Password: MasterPasswd2020!

This lab's content repo is located here.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Get Started
Who’s going to be learning?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!