AWS Certified Solutions Architect - Professional 2020

Sign Up Free or Log In to participate!

EBS volume failure rates vs. physical disk is misleading…

Given the architecture exams are scenario-based, I doubt this particular topic would come up in the exam, but if you work with customers that manage their on-prem storage then this is worth noting…  EBS volumes are virtual, not physical.  That means each EBS volume is spread over many physical disks.  How they are spread depends on how AWS manages their storage.

Given the numbers AWS provides regarding the EBS availability and durability metrics I would have to guess they spread EBS volumes over somewhere around 20 physical disks at the minimum.  This is an old architecture that storage vendors like Netapp and EMC have used since the 90’s.  AWS uses the same commodity disks as everyone else, so they aren’t using "better physical disks" than everyone else.  They are just spreading the potential failures over many physical devices to get the 0.2% failure rate for EBS volumes.

Again, this is not something we will need to know for the exam, but it could come up in the real world…

2 Answers

Is every on-prem volume contained solely on a single disk drive?  As you say, storage vendors clearly spread volumes across multiple disks – so…wouldn’t those also be virtual volumes from the perspective of the systems that use those volumes?  Just like EC2 instances and EBS?

Are all of the on-prem physical system automatically mirroring every volume?  I have no clue for sure, as I’ve never worked with any of the larger-scale storage options, which certainly would have mirroring and snapshots capabilities.  But mirroring enabled by default?  I doubt it.  EBS of course maintains two copies of the volume data.  Sounds like a key ingredient for a lower volume failure rate to me.

As long as AWS meets their promised numbers, who cares how it’s done in their datacenters?  I certainly don’t, and more importantly, I don’t need to.  I provision resources, they appear, I use them until I don’t want to anymore, and I no longer have to spend time worrying about hardware management, Throne be praised.

As far as I can determine, that’s "real world" in the cloud. 

So unless you have some other number sources to contradict the percentages referenced in the video and demonstrate how they are misleading (and I’d personally like to see where those numbers came from as well), I’d have to say that what’s really misleading is the title of your post.

Hi Steven.  Thanks for the feedback.

I was not questioning the actual numbers.  I was stating that when people look at the numbers it makes AWS look like they are doing something special to get such a low failure rate.  The comparison AWS uses is not valid since it’s comparing physical devices to virtualized storage volumes.  Virtualized volumes are made up of many physical disks.  That’s why the failure rates for physical disks are higher, ad why the comparison is misleading.  I would use the comparison as a way to show how virtualizing storage is much better than physical devices on their own.

When I was talking about on-prem storage, I was referring to the people that mange the storage, not the storage itself.  I worked at a storage company for 9 years.  We were always skeptical about posted numbers for performance and availability.  That’s one reason why I made the original post.  I prefer to inform others to question what they read.  Storage admins should look at the numbers AWS posted and know right away they are talking about 2 different things (physical devices vs. virtualized storage).  I wanted to make sure people taking this course knew the distinction in case they run into it during their normal work day.

Steven Moran

Fair enough!

Sign In
Welcome Back!

Psst…this one if you’ve been moved to ACG!

Get Started
Who’s going to be learning?