AWS Certified Solutions Architect - Professional 2020

Sign Up Free or Log In to participate!

Lab final answer *SPOILERS*

I got a completely different result. I took the expert route, and only used Athena + the SQL queries in the resources section. I then downloaded my results and sorted them quickly with OpenOffice Calc. I didn’t think to use Glue at all.

The location with the highest average O3 that day was:

Riverside-San Bernardino-Ontario 33.708566 -116.215394 2018-10-09 20:00:00.000 0.069 ppm

And about 40 other locations that had a higher concentration that day than Tulare, which was only 0.061 ppm.

Mgstaley

I got the same answer as you. I used Athena and SQL and got the Create Table code from the OpenAQ website. (But wow, Glue is super awesome! It makes life so much easier!)

Hesham Amin

I’ve done the same. What’s missing is calculating the average over the whole day. Then the result matches Scott’s answer.

5 Answers

Hi Cecil,

Good job on taking the Expert path!  It’s hard to venture what might be different between the path I took and the path you took, but the real point was to dig in and provide you a reason to get your hands dirty…which you did so good job!

Some people who’ve gotten different results had left of some aggregations, average or max in their queries…if I recall the data, there were multiple readings per day….but it’s been almost a year since I dug into that case.

Congrats on finding your own path!

–Scott

I think that people are getting different answers, because when using QuickSight we’re visualizing the average of all the readings throughout the day and that is when Tulare has the highest value. While when we use just Athena to order the value entries in descending order, we will get other answers.

I thought I was going crazy because I was getting the same answer as DarkCecil. @Edu is correct, if you average the readings by city/location for the entire day, then you get TULARE:

SELECT city, location, avg(value) AS avgo3

FROM openaq_20181009

WHERE country = ‘US’

AND parameter = ‘o3’

GROUP BY city, location

ORDER BY 3 desc

I also choose the expert path, my way was:

1. Find the data source free for quality air, I got OpenQA

2. Export data from their S3 bucket to a table in Athena

`

CREATE EXTERNAL TABLE openaq(

date struct<utc:string,local:string> COMMENT ‘from deserializer’,

parameter string COMMENT ‘from deserializer’,

location string COMMENT ‘from deserializer’,

value float COMMENT ‘from deserializer’,

unit string COMMENT ‘from deserializer’,

city string COMMENT ‘from deserializer’,

attribution array<struct<name:string,url:string>> COMMENT ‘from deserializer’,

averagingperiod struct<unit:string,value:float> COMMENT ‘from deserializer’,

coordinates struct<latitude:float,longitude:float> COMMENT ‘from deserializer’,

country string COMMENT ‘from deserializer’,

sourcename string COMMENT ‘from deserializer’,

sourcetype string COMMENT ‘from deserializer’,

mobile string COMMENT ‘from deserializer’)

ROW FORMAT SERDE

‘org.openx.data.jsonserde.JsonSerDe’

STORED AS INPUTFORMAT

‘org.apache.hadoop.mapred.TextInputFormat’

OUTPUTFORMAT

‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’

LOCATION

‘s3://openaq-fetches/realtime-gzipped’

TBLPROPERTIES (

‘transient_lastDdlTime’=’1518373755’) 

`

3. Looking for a query to get the last question.

`

SELECT *

FROM

(SELECT city,

unit,

AVG(value) AS AvgO3

FROM "sampledb"."openaq"

WHERE country=’US’

AND parameter=’o3′

AND date.utc LIKE ‘%2018-10-09%’

GROUP BY city, unit) AS AVG_TABLE

WHERE AvgO3 =

(SELECT MAX(AvgO3)

FROM

(SELECT city,

unit,

AVG(value) AS AvgO3

FROM "sampledb"."openaq"

WHERE country=’US’

AND parameter=’o3′

AND date.utc LIKE ‘%2018-10-09%’

GROUP BY city, unit))

`

My result was 

1 TULARE ppm 0.056333333

As has been said before

🙂

Result can be differ but overall it was good knowledge and understanding of data flow and result

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?