The Root Canary Project
In 2017, we joined the Root Canary project, a joint project run by seven partners: SURFnet, the University of Twente, Northeastern University, NLnet Labs, SIDN Labs, the RIPE NCC and ICANN. The goal of the international collaborative project was to carefully monitor the planned root KSK rollover from as many angles as possible and to act as an early warning system if anything went wrong.
The KSK rollover
The root zone was protected with DNSSEC in July 2010. For DNSSEC to provide continuing security, it is recommended that the keys are rolled over at certain intervals. Although the IANA Functions Contract requires the KSK to be rolled over, it includes no detailed timeline or implementation plan. The Root Zone KSK Operator DPS is more specific, making the following rollover requirement in Section 6.5:
"Each RZ KSK will be scheduled to be rolled over through a key ceremony as required, or after 5 years of operation."
After extensive preparations, the new KSK was generated on 27 October 2016, marking the start of the KSK rollover process. The original date set for rolling over the root DNS key was 11 October 2017. However, after extensive community review, the rollover was postponed for a year. As a result, the rollover took place on 11 October 2018.
Here is how we monitored that process through the Root Canary project.
The project used all of the 10,000-plus RIPE Atlas probes to measure the rollover at regular intervals, by sending DNS queries requesting the DNSKEY of the root zone and analysing the RRSIGs in the responses. Obviously, the RRSIG is cached by resolvers, for a maximum TTL of 172800 seconds (48 hours). Our measurements were intended to determine whether and when resolvers would drop the old RRSIG (key ID 19036) from their caches and pick up the new one, generated on 11 October 2018 at 16:00 UTC with key ID 20326. We also wanted to establish whether proper validation would continue as expected.
If a large number of resolvers had been observed to malfunction after the KSK rollover, that would have indicated an anomaly and could have led to the implementation of a back out plan. Luckily, as our measurements showed, that turned out not to be the case.
A few hours before the KSK rollover, we started publishing information about the status of the resolvers we were monitoring. The information was published at short intervals via our Twitter account and on the website of our project partners at nlnetlabs.nl. We continued publishing the information until two days after the rollover.
First, the most important result: from our perspective, the rollover was a success. We did not observe any major deviations during the rollover or in the following 48 hours. The red line in the graph below shows the number of resolvers that were not able to validate DNSSEC signatures after the rollover. As you can see, the line stays stable at below 1%, which is what we were hoping to see. Also, the number of validating resolvers remained stable.
The observations imply that resolvers successfully replaced the old signature, created with the old 2010-KSK, with a new signature, created with new 2017-KSK. The figure below shows that 16 hours after the rollover, 50% of resolvers already had the new signature in their caches and were validating with the new key. After 48 hours, more than 99% of resolves were up to date, which was the expected outcome.
Regarding the situation in the Netherlands: RIPE Atlas provides us with over 500 vantage points located in the Netherlands. Only four of them reported issues during the rollover, all on different networks. They indicate local, isolated issues rather than a major issue affecting a larger service provider.
To sum up, the rollover was a great success from our point of view. We did not detect any major issues with resolvers whatsoever; other sources also reported no significant issues. We congratulate ICANN and the many people involved in the rollover process on this noteworthy achievement.
We intend to follow up our immediate observations by diving deeper into our measurements and the raw data that we collected. Also, the Root Zone Operators will provide traffic traces for the period of the rollover, which should provide us with further insight, particularly into resolvers that are not covered by the RIPE Atlas infrastructure. We intend to define and provide guidelines for future root KSK rollovers, which will hopefully be carried out more regularly and with greater confidence in the future.
Further results will be published here and at https://rootcanary.org .
Many thanks to RIPE for running the measurements for us and a big thank you for the helpful feedback we received from the community.