Yesterday Route53 suffered several hours outage, and it was reported in many news media. e.g. https://www.theregister.co.uk/2019/10/22/aws_dns_ddos/
How would one design around that given that route53 was the most reliable service with 100% availability? use DNS caches by increasing TTLs? any comments or insights?
Well DDoS on DNS is a very big deal as they are highly redundant, and especially if it is as high profile as AWS. But as we can see it happened. When you are the king of the mountain its a hacker challenge/adventure.
Several mitigations can be done – longer TTLs, multiple DNS Name Servers, Multiple DNS Providers
Route 53 doesn’t support AXFR / zone transfer, so multiple DNS providers is out when using Route 53 as primary. In a true enterprise environment, consider not using Route 53, at least as a primary DNS service.
The environment I currently work in uses an enterprise DNS service for primary, and we do host some subdomains in Route 53, which makes separation of teams and departments easier, as different departments have their own AWS accounts. Also worth considering in a true enterprise environment, Route 53 does not currently support DNSSEC. That might not be a big deal to most people, but in our environment is was the main deal breaker.