One thing about S3 I expected to hear and don’t think I did was the fact that updating an object is an all or nothing operation. So, if you’ve got a 1TB text file object and you want to append a single line to it, you have to replace the whole thing. Do I have this right? Is it too obvious to mention? It would have fit nicely at the "S3 is not a file system, even though the S3 URLs make it look like one" section of this excellent S3 overview.
Yep, you are correct. S3 is an object store so files stored in S3 are objects–like a little sealed box (or big sealed box with your 1TB example). If you want to append a new line to a large text file, you have to upload another full file. And that’s that…for now.
Things got weird when AWS introduced things like EMRFS for EMR and Athena. We can read records inside files stored on S3 were we used to have to fetch the whole file and parse it locally.
I would not be surprised if AWS added some features like append or some form of diff or merge capabilities. That seems like a natural progression.
Thanks, Scott for validating the atomicity issue. I tend to disagree with you on the possible append/diff/merge future of S3, but it’s an interesting prediction.
I should also add that I am really enjoying this course so far. I love your teaching style.
Sweet! Thank you!