Querying Data in Amazon S3 with Amazon Athena

1 hour
  • 3 Learning Objectives

About this Hands-on Lab

Welcome to this hands-on AWS lab, where we’ll query data stored in Amazon S3 with SQL queries in Amazon Athena. Let’s get started!

Learning Objectives

Successfully complete this lab by achieving the following learning objectives:

Create a Table from S3 Bucket Metadata
  1. Navigate to Amazon Athena.
  2. Create a table from S3 bucket data using the following values:
    • Database: aws_service_logs
    • Table Name: cf_access_optimized
    • Location of Input Data Set: s3://<S3_BUCKET_NAME>/
    • Data Format: Parquet
      1. Bulk add columns using this data:
        time timestamp, location string, bytes bigint, requestip string, method string, host string, uri string, status int, referrer string, useragent string, querystring string, cookie string, resulttype string, requestid string, hostheader string, requestprotocol string, requestbytes bigint, timetaken double, xforwardedfor string, sslprotocol string, sslcipher string, responseresulttype string, httpversion string
  3. Create the following partitions:
    • Column Name: year
    • Column Name: month
    • Column Name: day
  4. Click Create table.
Add Partition Metadata
  1. Open a new query tab

  2. Run the following query:

    MSCK REPAIR TABLE aws_service_logs.cf_access_optimized`
  3. Verify the partitions were created with the following query:

    SELECT count(*) AS rowcount FROM aws_service_logs.cf_access_optimized
  4. Run the following query:

    SELECT * FROM aws_service_logs.cf_access_optimized LIMIT 10`
Query the Total Bytes Served in a Date Range
  1. Perform the following query:

    SELECT SUM(bytes) AS total_bytes
    FROM aws_service_logs.cf_access_optimized
    WHERE time BETWEEN TIMESTAMP '2018-11-02' AND TIMESTAMP '2018-11-03'
  1. Observe the value for total_bytes equals 87310409.

Additional Resources

Make sure you are in the us-east-1 region throughout this lab.

When prompted to add column definitions in bulk, you may use this data to save time:

time timestamp, location string, bytes bigint, requestip string, method string, host string, uri string, status int, referrer string, useragent string, querystring string, cookie string, resulttype string, requestid string, hostheader string, requestprotocol string, requestbytes bigint, timetaken double, xforwardedfor string, sslprotocol string, sslcipher string, responseresulttype string, httpversion string
MSCK REPAIR TABLE aws_service_logs.cf_access_optimized

SELECT count(*) AS rowcount FROM aws_service_logs.cf_access_optimized

SELECT * FROM aws_service_logs.cf_access_optimized LIMIT 10

SELECT * FROM aws_service_logs.cf_access_optimized order by time desc LIMIT 10

SELECT * FROM aws_service_logs.cf_access_optimized order by time asc LIMIT 10

SELECT * FROM aws_service_logs.cf_access_optimized WHERE time BETWEEN TIMESTAMP '2018-11-02' AND TIMESTAMP '2018-11-03'

SELECT sum(bytes) as total_bytes FROM
aws_service_logs.cf_access_optimized WHERE time BETWEEN TIMESTAMP '2018-11-02' AND TIMESTAMP '2018-11-03'

In this lab, you'll be analyzing these optimized CloudFront access logs using Amazon Athena. Athena is an interactive query service that can help you analyze data for various AWS services, including CloudFront.

CloudFront raw access logs are stored in a CSV format, called Web Distribution Log File Format.

More information about Apache Parquet can be found here.

More information about Glue and partitioning data can be found here here.

What are Hands-on Labs

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Get Started
Who’s going to be learning?

How many seats do you need?

  • $499 USD per seat per year
  • Billed Annually
  • Renews in 12 months

Ready to accelerate learning?

For over 25 licenses, a member of our sales team will walk you through a custom tailored solution for your business.


$2,495.00

Checkout
Sign In
Welcome Back!
Thanks for reaching out!

You’ll hear from us shortly. In the meantime, why not check out what our customers have to say about ACG?