Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Querying Data in Amazon S3 with Amazon Athena

Welcome to this hands-on AWS lab, where we'll query data stored in Amazon S3 with SQL queries in Amazon Athena. Let's get started!

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 1h 0m
Published
Clock icon Nov 06, 2020

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Create a Table from S3 Bucket Metadata

    1. Navigate to Amazon Athena.
    2. Configure settings to send query results to the S3 bucket.
    3. Create a table from the S3 bucket data using the following values:
      • Database: aws_service_logs
      • Table Name: cf_access_optimized
      • Location of Input Data Set: s3://<S3_BUCKET_NAME>/
      • Data Format: Parquet
    4. Bulk add columns using this data:
      time timestamp, location string, bytes bigint, requestip string, method string, host string, uri string, status int, referrer string, useragent string, querystring string, cookie string, resulttype string, requestid string, hostheader string, requestprotocol string, requestbytes bigint, timetaken double, xforwardedfor string, sslprotocol string, sslcipher string, responseresulttype string, httpversion string
      
    5. Create the following partitions:
      • Column Name: year
      • Column Name: month
      • Column Name: day
    6. Click Create table.
  2. Challenge

    Add Partition Metadata

    1. Open a new query tab.

    2. Run the following query:

      MSCK REPAIR TABLE aws_service_logs.cf_access_optimized
      
    3. Observe that the row count equals 207535 with the following query:

      SELECT count(*) AS rowcount FROM aws_service_logs.cf_access_optimized
      
    4. Verify the partitions were created with the following query:

      SELECT * FROM aws_service_logs.cf_access_optimized order by time desc LIMIT 10
      
  3. Challenge

    Query the Total Bytes Served in a Date Range

    1. Observe the bytes column from the following query:

      SELECT * FROM aws_service_logs.cf_access_optimized WHERE time BETWEEN TIMESTAMP '2018-11-02' AND TIMESTAMP '2018-11-03'
      
    2. Run the following query:

      SELECT SUM(bytes) AS total_bytes
      FROM aws_service_logs.cf_access_optimized
      WHERE time BETWEEN TIMESTAMP '2018-11-02' AND TIMESTAMP '2018-11-03'
      
    3. Observe the value for total_bytes equals 87310409.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans