1 Answers
Hi Sundeepan,
AWS has a listing of public data sets at https://registry.opendata.aws/ . This is where I found the OpenAQ data set. It’s a great resource if you want to try stuff that requires a large amount of data like a data lake or machine learning.
–Scott
Hi Scott, I can see the ARN for the bucket but not following how you were able to translate that into the url (other than the fact that you provided it).
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html
if your bucket is in US East region, then,.s3.amazonaws.com , if any other region or region specific then, it is : s3-.amazonaws.com
I keep getting access denied errors: [d8e334c7-9007-4aa6-92a6-9d81636fe0aa] ERROR : Error Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 99510427A1E9045E; S3 Extended Request ID: Jwqetwre85BXV+cAgj3+e8nfwghKX0c+GPKFYMG7ES3g9NCPv0mta7Wc53+bbc3A/zuwn69DEX0=) retrieving file at s3://openaq-fetches/realtime/2019-03-03/1551571266.ndjson. Tables created did not infer schemas from this file.
I had a similar issue then figured out that the role that was created was setup for a different date (S3 bucket) and I was trying to run the crawler for a different date.