Querying Real Estate Data in S3 Using Amazon Athena

1.5 hours
  • 5 Learning Objectives

About this Hands-on Lab

In this AWS hands-on lab, you’ll use Amazon Athena to query sample data of sold Manhattan houses stored in Amazon S3. To do so, you’ll first upload the sample data to Amazon S3, partition the data in hive format, create an underlying table in Amazon Athena, and finally, use the Amazon Athena query editor to run SQL queries against the estate data.

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Populate S3 with Manhattan Real Estate Data

Download the CSV files from this lab’s GitHub repository, and upload them to Amazon S3.

Note: For partitioning purposes, make sure to create a respective folder for each CSV file.

Set Up Amazon Athena

Create a folder in Amazon S3 and update query results in Amazon Athena using that folder.

Create a Table from the S3 Bucket Metadata

Create a database and a table using the first and second queries found in the README file. If everything is successful, you should see the table listed under Tables and views.

Add Partition Metadata

Load the partitions and confirm they have been loaded using the third command in the README file.

Query Data Using SQL

Query the data using SQL. You can use several different SQL queries to explore the data; for example, you can use the fourth command in the README file.

Additional Resources

An analysis team working in a real estate firm needs to perform exploratory analysis on different neighborhoods in Manhattan. They assigned you the task of uploading the data in Amazon S3 and querying it in Amazon Athena in the most optimal fashion.

Before you start the lab, get the real estate source files from this lab's GitHub repository.

Make sure you are in the us-east-1 Region throughout this lab.

Tip: In the proposed solution of this hands-on lab, you don't use the Create Table wizard, as you use the code snippets provided in the lab guide. However, if you plan to use the Create Table wizard to solve this lab, feel free to use the bulk add column feature and enter the following data to save time: price double, bedrooms double, bathrooms double, sqft double, status string, address string.

Additional Resources:

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

