Spectacular UPS Failure

A bit off topic but I should document what happened in work today. Got called to investigate a burning smell in one of the offices that house all the servers and network head end. The request was placid enough not to cause alarm but when I got to the room the smell hit you as soon as the door opened. Narrowing the smell down, it was coming from a caged off area underneath a desk that held the servers: An ancient IBM RS600 with UPS and two HP Proliant ML350 G5 with a shared UPS in two modules, along with what seemed decades of dust, discarded cables and old computer hardware that had accumulated over the years.

Servers claimed by years of dust
Servers claimed by years of dust

Once I got down there and started to fathom out what cables were in use and what could be safely isolated without stopping operations, the small wafts of smoke could be seen drifting up from under the desk. At this point it was obvious that any timescales for diagnosing the issue was getting smaller along with the grace period before the smoke detectors trigger the fire alarms and clears the store.

On the initial look, I noticed that one of the Proliant servers had a flashing LED next to a power symbol, two and two went together and thought that a power supply had failed spectacularly, so chose to switch it off, knowing the server was just for redundancy.

A minute passed and no let up of the smoke, by this time a CO2 extinguisher, pin pulled, was close at hand. Out of ideas I pulled all plugs from the wall, the RS6000 UPS failed immediately, the Proliants carried on under battery juice with 105mins left according to their UPS display (1 was still powered off). I left it another minute to rule out a problem with an input to the UPS, and with nervous relief the smoke subsided, a few back office systems went down with the RS6000 but the customer end Proliant stayed online.

With the batteries keeping customer facing systems online for a further hour or so, it was a safe time to find the culprit. An extensive sniff test and the UPS for the RS6000 was pointed out as the source of the incident, possibly why it failed as soon as power was cut. It was taken out of commission and bypassed to get the IBM machine back online.

Failed UPS, I'm not so trusting of you anymore
Failed UPS, I’m not so trusting of you anymore

A rather eventful day compared to the normal, mundane non-IT job. I haven’t opened up the failed UPS to see what went wrong, nor would I want to thinking about what state the (probably) lead cells are in.