Querying Data from Multiple Redshift Spectrum Tables

1 hour
  • 3 Learning Objectives

About this Hands-on Lab

In this lab, we will utilize Redshift Spectrum to create several tables from data stored in S3, and then test to ensure we are able to perform queries that include joins of the tables we have created.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Inspect the Lab Environment

Use the AWS Web Console to investigate the lab environment. You should find an S3 bucket named users-data-<ACCOUNT_NUMBER>, which contains a single JSON file as well as a single node Redshift cluster.

We recommend downloading and inspecting the JSON data, as you will need to write SQL pertinent to this data to create your Redshift Spectrum tables.

Create the Redshift Spectrum Tables

Using either the Redshift web console query editor or the client of your choosing, create the 5 requested Redshift Spectrum tables.

GitHub Repo

Test Your Newly Created Redshift Spectrum Tables

Using either the Redshift web console query editor or the client of your choosing, run the supplied test query to ensure your tables are correctly formed and it is possible to perform joins between them.

Additional Resources

Scenario

You have been tasked with testing accessing data stored in S3 from a Redshift cluster. Your team is hoping to realize cost savings by moving infrequently accessed non-latency sensitive data outside of your company's production Redshift cluster. They've provided the required tables below with the requisite column headers, as well as test data in an S3 bucket. You will need to create Redshift Spectrum tables with the appropriate DDL and run the provided test query to ensure Redshift Spectrum will function for the planned use case.

Additional Resources

GitHub Repo

Redshift Create External Table

Required Tables

Names
id_name
id_value
gender
name_title
name_first
name_last

Location
id_name
id_value
location_street_number
location_street_name
location_city
location_state
location_country
location_postcode
location_coordinates_latitude
location_coordinates_longitude
location_timezone_offset
location_timezone_description
nat

Age
id_name
id_value
dob_date
dob_age
registered_date
registered_age

Contact
id_name
id_value
email
phone
cell

Picture
id_name
id_value
picture_large
picture_medium
picture_thumbnail

Test Query

select 
    names.name_first as first_name, 
    names.name_last as last_name, 
    location.location_state as state, 
    age.dob_age as age, 
    contact.cell as cell, 
    picture.picture_large as picture
from users_data.names
    join users_data.location on users_data.names.id_value = users_data.location.id_value 
    join users_data.age on users_data.names.id_value = users_data.age.id_value 
    join users_data.contact on users_data.names.id_value = users_data.contact.id_value
    join users_data.picture on users_data.names.id_value = users_data.picture.id_value
order by age
limit 10;

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?