Monday, November 9, 2009

Preventing DNS Outages

Earlier this afternoon there was a DNS outage at the Cybercon data centre. This interrupted service to the WunderCounter. One of the core WunderCounter machines relied solely on Cybercon DNS for contact with the outside world and it went offline along with the in house DNS. This is known as an SPOF (Single Point of Failure) and, as such, this machine was incorrectly configured. Once the onsite outage was over, I had staff revive the machine and I've reconfigured the DNS to query servers from two unrelated providers in order to avoid this problem in future.

I'm going to audit the rest of the machines on the network in order to ensure that this sort of DNS outage doesn't affect the WunderCounter in future. One positive out of all of this is that all of the WunderCounter servers which rely on OpenDNS were unaffected. I highly recommend OpenDNS for both home and commercial use. I've found it to be an excellent service.

So, my apologies for the DNS issues -- this particular problem will not occur in future. Mistakes will happen, but what's most important is that they're not repeated.

No comments: