Welcome to this hands-on AWS lab, where we’ll query data stored in Amazon S3 with SQL queries in Amazon Athena. Let’s get started!
Learning Objectives
Successfully complete this lab by achieving the following learning objectives:
- Create a Table from S3 Bucket Metadata
- Navigate to Amazon Athena.
- Create a table from S3 bucket data using the following values:
- Database: aws_service_logs
- Table Name: cf_access_optimized
- Location of Input Data Set: s3://<S3_BUCKET_NAME>/
- Data Format: Parquet
- Bulk add columns using this data:
time timestamp, location string, bytes bigint, requestip string, method string, host string, uri string, status int, referrer string, useragent string, querystring string, cookie string, resulttype string, requestid string, hostheader string, requestprotocol string, requestbytes bigint, timetaken double, xforwardedfor string, sslprotocol string, sslcipher string, responseresulttype string, httpversion string
- Bulk add columns using this data:
- Create the following partitions:
- Column Name: year
- Column Name: month
- Column Name: day
- Click Create table.
- Add Partition Metadata
Open a new query tab
Run the following query:
MSCK REPAIR TABLE aws_service_logs.cf_access_optimized`
Verify the partitions were created with the following query:
SELECT count(*) AS rowcount FROM aws_service_logs.cf_access_optimized
Run the following query:
SELECT * FROM aws_service_logs.cf_access_optimized LIMIT 10`
- Query the Total Bytes Served in a Date Range
Perform the following query:
SELECT SUM(bytes) AS total_bytes FROM aws_service_logs.cf_access_optimized WHERE time BETWEEN TIMESTAMP '2018-11-02' AND TIMESTAMP '2018-11-03'
- Observe the value for
total_bytes
equals87310409
.