Wednesday, January 13, 2010

When a Power Supply Burns Out Late at Night...

You may have noticed there was some downtime earlier this evening (Tuesday, Jan 12), around 10:20 pm CST. One of the main WunderCounter servers went offline with a burned out power supply. I get an SMS within a minute of this kind of thing, so we were able to begin dealing with it immediately. Normally losing a power supply is not a big deal, but in this case the data centre didn't have a matching part on hand, so I had to have one of the techs take an unused server off the rack, extract its power supply and use this to bring the front end of the WunderCounter back online. The entire process took about 90 minutes from finding the problem, diagnosing it, finding the correct part, replacing it and getting the machine back on the rack and correctly connected to the network.

I apologize for the downtime. I have, actually, already been in the process of replacing this machine with a pair of newer units, but the new machines aren't fully configured and tested yet. I hope to have that done in the next week or so in order to add some more redundancy to the system. I'll be having a closer look at what's on the rack and possibly having some more machines and/or parts shipped in so that I continue to have enough spare hardware handy for any future issues.

No comments: