In this AWS hands-on lab, you’ll use Amazon Athena to query sample data of sold Manhattan houses stored in Amazon S3. To do so, you’ll first upload the sample data to Amazon S3, partition the data in hive format, create an underlying table in Amazon Athena, and finally, use the Amazon Athena query editor to run SQL queries against the estate data.
Successfully complete this lab by achieving the following learning objectives:
- Populate S3 with Manhattan Real Estate Data
Download the CSV files from this lab’s GitHub repository, and upload them to Amazon S3.
Note: For partitioning purposes, make sure to create a respective folder for each CSV file.
- Set Up Amazon Athena
Create a folder in Amazon S3 and update query results in Amazon Athena using that folder.
- Create a Table from the S3 Bucket Metadata
Create a database and a table using the first and second queries found in the README file. If everything is successful, you should see the table listed under Tables and views.
- Add Partition Metadata
Load the partitions and confirm they have been loaded using the third command in the README file.
- Query Data Using SQL
Query the data using SQL. You can use several different SQL queries to explore the data; for example, you can use the fourth command in the README file.