Manually Migrating Data Between Redshift Clusters

45 minutes
  • 5 Learning Objectives

About this Hands-on Lab

In this lab, we’ll utilize the Redshift `UNLOAD` and `COPY` commands to migrate data between an existing Redshift cluster, which we will launch in the course of completing the lab.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Investigate the Lab Environment

Log in to the provided AWS Account and inspect the provided resources.

You should find a Redshift cluster with the ID users-cluster as well as an empty S3 bucket with a name that begins with users-data-. Additionally, there is an IAM role attached to users-cluster which will be used when launching a second Redshift cluster with the ID RedshiftS3.

Note: The ARN of this role is provided in the lab interface, as is the name of the S3 bucket for easy reference.

Launch the Target Redshift Cluster

In the Redshift web console, launch a new cluster with the same configuration as users-cluster, using the ID users-cluster-2.

Copy the Existing Redshift Table to S3

Utilizing either the Redshift web console query editor or the client of your choosing, use the UNLOAD command to copy the existing test table to the provided S3 bucket in parquet format.

UNLOAD ('select * from users_data')
TO '<users-data-bucket>'
IAM_ROLE '<RedshiftS3 ARN>'
FORMAT AS PARQUET;
create table users_data(
  id_value varchar(64),
  name_first varchar(64),
  name_last varchar(64),
  location_country varchar(32),
  dob_age int,
  picture_large varchar(64),
  primary key(id_value)
)
distkey(location_country)
compound sortkey(id_value);
Copy Data from S3 to the Newly Launched Redshift Cluster

Utilizing either the Redshift web console query editor or the client of your choosing, create a new table in users-cluster-2 which matches the table provided on users-cluster. Once this table is created, use the COPY command to load the data which has been backed up in S3 to your new table.

Note: Table definition information is provided in the lab instructions.

COPY users_data    
FROM '<users-data-bucket>'
IAM_ROLE '<RedshiftS3 ARN>'
FORMAT AS PARQUET;
Check Your Data

Once the above objectives are fulfilled, you should be able to run the following query from both clusters and receive an identical response:

select * from users_data limit 10;

Additional Resources

You have been presented with a few pain points to solve around your company's Redshift solution. The original Redshift cluster that was launched for the company's analytics stack has become underpowered over time. Several groups wish to create incremental backups of certain tables to S3 in a format that can be plugged into data lake solutions, as well as other groups wishing to have select pieces of the main Redshift schema splintered to new department-specific clusters.

You've come up with a plan to utilize the UNLOAD and COPY commands to facilitate all of the above, and need to test a proof of concept to ensure that all pain points above can be addressed in this manner.

You've requisitioned a test AWS account with a Redshift cluster containing a relatively small test table in it. Also provided in the AWS account is an S3 bucket which will function as an intermediary storage point between Redshift clusters, as well as a point to test backup/data lake solutions.

The existing table has the following definition:

users_data - 
id_value varchar(64),
name_first varchar(64),
name_last varchar(64),
location_country varchar(32),
dob_age int,
picture_large varchar(64),

primary key(id_value)

distkey(location_country)
compound sortkey(id_value);

Additional Resources

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?