The DNS has been designed with several defences strategies to improve availability and resilience, both on the authoritative servers side and on the recursive resolvers side.
Authoritative DNS servers (the ones that host entire zones, like the Root servers) rely heavily on replication by having name server replication and/or IP anycast. Name server replication is when multiple servers are used for the same zone, as with the root zone, which has 13 name servers. Each of them, in turn, can be further replicated using IP anycast, which allows the same IP address to be announced from multiple physical locations and enables BGP to map a client to each location.
On the recursive resolver side, caching (sometimes using multiple levels) of DNS answers and retrying queries are the two main tactics used to provide resilience.
Given that various strategies are in use, we have conducted a study to establish how the strategies affect DNS resilience and latency in the wild, exploring both the client-side DNS user experience and server-side traffic. To do that, we carried out controlled experiments using Ripe Atlas and analysed traffic from two production zones (.nl and the Roots).
Our study reached four main conclusions, which serve as recommendations for both operators and developers:
- We found that, in the wild, caching behavior is often as expected, but about 30% of the time clients do not benefit from caching (Section 3). That observation was confirmed in both the .nl zone and the Root zones (Section 4).
- On the recursive resolver side, caching and retries provide a significantly resilient client user experience during DDoS attacks (Section 5). Even with very heavy query loss (90%) on all authoritatives, full caches protect half of the clients, and retries protect 30%.
- On the authoritative side, we showed that there is a large increase in legitimate traffic during a DDoS attack (up to eight times in our experiments), mainly due to retries. While DNS servers are typically heavily overprovisioned, this result suggests the need to review the level of overprovisioning (Section 6).
- Finally, our results enable us to suggest why users have experienced relatively little impact from DDoS attacks on Root servers while the customers of some DNS providers felt the impact of attacks straight away (Section 8).
A previous version of this blog has appeared at the Ripe Atlas blog.