Simplifying the basics of Amazon’s complex DNS solution
“Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it.” — Alan Perlis, Epigrams in Programming
DNS, or Domain Name System, was invented in the early 1980s during the dawn of the Internet. The distributed and dynamic naming solution translates host names into IP addresses — a foundational technology that made the World Wide Web possible.
Over the past three decades, DNS technology has been refined and refreshed to support far more sophisticated needs. The unfortunate truth is that DNS is now very complex — it’s often the most difficult part of learning how to configure websites and servers.
Amazon Route 53
Once you’ve twisted your mind around the basics of DNS, pragmatists of AWS will suffer the additional complexities introduced by Amazon’s DNS solution — Route 53.
The Amazon Route53 service introduces unique options such as “health checks” and “policy routing” — undifferentiated heavy lifting that many developers just want to avoid or remove.
Since ignoring the complexity of DNS isn’t an option, I’ve gotten a lot of requests from readers of my Last Week in AWS newsletter asking for some insights on using the service.
Let’s start with Route53 health checks
Amazon Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.
Health checks are aimed at AWS endpoints — a resource running within your AWS account such as Amazon EC2 instance. You’ll get free health checks for your first 50 endpoints, after which they’ll cost you 50¢ each per month.
You can set health checks to query for a particular string, and can target a specific URL. Be aware that by default, you can expect a health check to hit every 2 — 3 seconds given the large number of endpoints Route53 can monitor. If this is going to be a problem for your application, consider alternatives to a URL which exercises your entire application’s workflow.
These health checks serve two purposes.
- The first goal is DNS failover. This ensures that only healthy endpoints are returned to requesters.
- The second is to generate CloudWatch metrics for alarms, trend reports, or triggering Lambda functions at certain thresholds.
Next, let’s discuss Traffic Policies
When you create a resource record set, you choose a routing policy, which determines how Amazon Route 53 responds to queries.
Before we dive into specifics, a word of caution: DNS is a blunt instrument with respect to routing traffic. You can’t guarantee:
- Client libraries will respect your TTL
- Client resolvers will respect your TTL
- The IP address of the resolver is anywhere remotely close to the end user’s location
This speaks to the unfortunate truth that DNS is a very coarse tool for manipulating traffic. It’s reasonable for handling cross-region redirection. But once you’re in a particular location — using a load balancer will almost always result in much finer grained control over your traffic.
Route 53 Traffic Policies
These will cost you $50 per month apiece, so you’re likely going to want to be judicious with where you use these puppies.
The first and simplest option is Failover Routing. This is pretty straightforward — it routes traffic to one location when a health check passes, and to another location when that health check fails.
Next, we have Geolocation Routing. This routes traffic to locations closer to where the request originates from, in a broad sense; think “right side of the ocean” more than “right side of town.”
“But wait! I use OpenDNS; how can Route53 possibly know where I’m coming from if all of my requests come from centralized public resolvers?”
The answer lies within an extension to the DNS protocol, specifically
edns-client-subnet. This passes your originating IP to the site you’re querying; assuming it’s supported by your resolver. It works very well — but you can’t depend on it being perfect.
Thirdly, we have multi-value routing. This returns multiple records that the client can determine what to do with, and is controlled by health checks. Anything unhealthy gets pulled.
Lastly, there’s weighted routing. This lets you do things like “direct 5% of traffic to the new version of the site.” This use case is almost always better served with a load balancer, due to fun edge cases like “the user’s resolver cached the first result it got, and it serves 200,000 people.”
It’s handy to know how these policies work, and how they can fail. Don’t let my pessimism fool you — my use case is probably not yours. Only you understand the specifics your application, and how it’s going to behave.
Unless you’re a DNS resolver, I welcome your comments and feedback!