1 Answers
Hi Brad,
If I remember correctly, the data is stored by data in the S3 bucket, so if you chose s3://openaq-fetches/realtime/2018-10-09/ as the source location for your air_quality table, you should not need to specify the date in the WHERE clause. I don’t do this in the VIEW creation.
If you’re selecting from the whole batch or more than one day, you might be pulling in some additional records. I also think your greater than / less than date clause might not be working as you’d expect. I think you need to CAST as Date. Check out this StackOverflow post:
https://stackoverflow.com/questions/51269919/athena-greater-than-condition-in-date-column
You might also try BETWEEN…something like this: BETWEEN date ‘2014-07-05’ AND date ‘2014-08-05’
–Scott
I would agree my date is not elegant but I was having issues with it for some reason. Besides that though the dates in the date field were accurate and it was only choosing o3 data. I tore down my environment already but it is not hard to run the query again. Either way it should work in theory, I never claimed to be a SQL expert.
Hah, no worries Brad. It also occured to me that you might also need an aggegration in there somewhere like a SUM since I think there were multiple readings per day.
I also got different results for the city and am curious why. the ddl I used was from https://gist.github.com/jflasher/573525aff9a5d8a966e5718272ceb25a
tried both local and utc
SELECT city, AVG(value) as value FROM "sampledb"."openaq" where date.local like ‘2018-10-09%’ and parameter like ‘o3’ and country = ‘US’ group by city, value order by value,city;