Monday, November 9, 2009

Preventing DNS Outages

Earlier this afternoon there was a DNS outage at the Cybercon data centre. This interrupted service to the WunderCounter. One of the core WunderCounter machines relied solely on Cybercon DNS for contact with the outside world and it went offline along with the in house DNS. This is known as an SPOF (Single Point of Failure) and, as such, this machine was incorrectly configured. Once the onsite outage was over, I had staff revive the machine and I've reconfigured the DNS to query servers from two unrelated providers in order to avoid this problem in future.

I'm going to audit the rest of the machines on the network in order to ensure that this sort of DNS outage doesn't affect the WunderCounter in future. One positive out of all of this is that all of the WunderCounter servers which rely on OpenDNS were unaffected. I highly recommend OpenDNS for both home and commercial use. I've found it to be an excellent service.

So, my apologies for the DNS issues -- this particular problem will not occur in future. Mistakes will happen, but what's most important is that they're not repeated.

Friday, November 6, 2009

Keeping a Slave Database in Sync

I haven't posted too many details of exactly how the WunderCounter is set up, but it uses MySQL replication. This means that there is always one master database server and one or more slave databases which download database updates from the master. The slave databases work much harder than the master and may occasionally require a reboot. Sometimes when this happens database tables crash and need to be repaired. This happened earlier today.

Usually the slave catches up to the master within a few minutes and the lag is barely noticed, but in this case, the crashed table had over 10 million rows. When you're dealing with that amount of data, MySQL isn't always able to repair the tables in a reasonable amount of time for a live site.

While I was dealing with this user log files appeared to be stuck and were not updating, which would make you think your tracking was not happening. What was actually happening was that your hits were being tracked by the master database, but not being updated on the slave database, which is the database which the reporting scripts connect to. The issue with the rogue table has now been fixed and the slave has caught up to the master. So, you can now view any of the hits which you were unable to view earlier. No data was lost, only delayed. :)

Wednesday, November 4, 2009

Dropping Support for Internet Explorer 6

Internet Explorer 6 is dead. Well, maybe not totally. I did some digging and I see that over the first few days of this month fewer than 2% of traffic to the dashboard page has been from MSIE 6 users.

Normally I wouldn't make an effort to shut a particular browser out, but basic things like CSS menus become much more difficult if you have to account for this very dated browser. I need to update the menu system in the member area. It's a bit buggy and not very pretty and it has to bend over backwards to accommodate MSIE in general.

I realize there are some situations where you are forced to use a particular browser. Some corporate environments lock down their machines to the point where employees cannot install any software at all. In some of these very paranoid cases, MSIE 6 is still the browser of choice. I can't tell you exactly why that is, as it's hard to fathom.

At any rate, if you're one of the remaining MSIE 6 users on this planet and you're able to upgrade or, even better, switch to Firefox or Chrome, I would encourage you to do so in the near future.