What went wrong with the Facebook services’ High-Availability?

This is a general system design question on High availability. How can a service working at such a huge scale have a global outage? What went wrong with their system design ? What happened to the High-Availability architecture? How could multiple availability zones black-out at the same time? How can we avoid this in our system design? Is there something that can purge all our availability zones and crumble our HA architecture? How to avoid that happening in AWS?

