Scalable DNS with AnyCast

A while back I was faced with a problem. The existing recursive DNS infrastructure in the datacenters were built on a traditional, common scaling design. We had a handful of DNS servers that were spread across the datacenters and stuck behind different flavors of load balancing. There were six or seven different resolver IP’s that hosts were pointed at depending on various factors. They all ran ISC BIND. Some used Cisco load balancers, some Zeus, some ldirectord. These aren’t necessarily bad solutions to the problem of scaling and high availability but we were running into problems when reaching the 40,000 query per second range. Interrupt driven kernels will experience live locking without some kind of trickery to avoid it. Load balancers are expensive and can be a single point of failure if you don’t double up hardware and so forth. This whole setup was complicated, expensive, and didn’t scale as easily as it could.

There had to be a better way.

After some research and design discussions I came up with a pretty elegant solution that solved all the above problems and was radically cheaper and simpler.

The Solution
The final implementation used a simple, and at this point well proven, technology: AnyCast. At the time AnyCast was fairly new on the infrastructure scene but it had a number of advantages. First of all it’s simple. You need no special hardware beyond the layer 3 switches you probably already have. The implementation is just a few lines in IOS and you’re up and running with a route and SLA check. Since the switches handle all the load balancing you can get rid of all that expensive load balancing gear with its added complexity.

Now that you have no load balancers to worry about you can just throw cheap, entry level nodes around every datacenter and point the AnyCast routes to them. In our case we just used cheap dual core boxes with 2GB of RAM each. Nothing special. This is horizontal scaling at it’s finest.

The final trick was to get rid of ISC BIND and replace it with unbound. Now, don’t get me wrong, ISC BIND works great and we could have continued to use it. There were a couple considerations that drove the decision for unbound however. First of all it performs nearly an order of magnitude better on the same hardware. Second, it does one thing and does it well – recursive queries and caching. Because of that its configuration is much simpler as well.

The deployment today consists of 16 AnyCast endpoints that are servicing an aggregate load of about 80,000 queries per second and could easily support much more than that. Initial performance testing showed that those cheap dual core hosts can support a query load of about 20,000 queries per second each.

A nice, clean setup that is simple and cheap. Perfect.

Design Considerations
There are a few things to be aware of when designing a system like this however.

  • CEF and XOR : Cisco gear has to make a decision on where to route inbound queries when there are multiple endpoints that have identical route distances. This ends up being a pseudo-load balancing in practice because it is not round robin. The switch decides where to route packets by XOR’ing the last two octets of the source and destination IP’s. The balance of traffic across a pool of endpoints ends up being pretty close to even but it’s not perfect. You have to be aware of this slight traffic imbalance when capacity planning.
  • Number of endpoints : CEF on Cisco devices currently only supports a maximum of 16 endpoints per device. In practice this isnt a practical problem though. It’s just something to be aware of.
  • More general capacity planning : If you were to ever lose the route to a switch, however unlikely, AnyCast will fail ALL the traffic destined to those endpoints to the next lowest cost route. If you dont plan for that you’ll send too much traffic to the next cluster of nodes which will DoS it and make the SLA checks fail. AnyCast will then send all the original traffic, and all the traffic for cluster number 2, on to the third cluster and so forth. Cascading failure of your whole infrastructure can happen.
  • Troubleshooting : It’s somewhat more complicated to know where traffic from a given DNS query is being routed. You have to dig around a lot to figure this out if there is a problem. It’s not impossible… just not as straightforward as designs that have a single cluster with a single virtual IP taking in all the inbound queries.

Beyond those few considerations though, a setup like this is quite reliable, endlessly scalable, and offers the ability to have a single DNS query target for all hosts across all datacenters.

Pretty nice.