Updated: May 6, 2021
Getting the Alert
Alarm information started coming in all different formats, from a call to an email to an alarm output. These were coming from different departments such as Voice, Data, Video, and utilities. We preceded to take note of all alarms and started strategic troubleshooting.
Getting to the Site
After receiving a phone call for high-temperature alarms and low voltage alarms in the Disaster Recovery Trailer we arrived at the headend to see high-temperature alarms and generator disconnect alarms also going off at the headend location. Furthermore, to everyone’s surprise, there was a commercial power that was not recorded and no one knew how this happened.
We found out that there was work being done on the street by a local power company that the client claims they weren’t notified. When they lost power, the load was transferred to the Generator which caused the breaker to trip so utility power was lost and there was no cooling. The equipment was then run by batteries. This resulted in a loss of cooling causing equipment to overheat.
Going Into troubleshooting Mode
We first needed to find the breaker that tripped and get utility power back-up. Once this happened, we got the utility power back up including lights and HVAC. Once this was done, we could begin troubleshooting issues.
Issues at the Disaster Recovery Trailer
Low voltage alarm
High temp alarm
Commercial power loss alarm
Inverter shut down.
Resolving the issues:
By finding the tripped breaker this automatically started resolving many of the ongoing issues. The batteries, which typically take 4 hours to re-charge began recharging and the HVAC kick back in which allowed the High temp alarm to go off once the temperature regulated after about two hours. Through this emergency, we found out that the Commercial Power Loss alarm was not actually triggered by the ongoing power outage but was able to be flagged as an ongoing issue. In order to clear the alarms in the inverter shut down, It had to be turned on and power cycled. The alarms we going off because it was too hot.
Issues at the Headend
No utility powers
A rectifier blew
Servers were overheating
The data router failed
Resolving the issues:
Many alarms cleared after the utility power was back. As these alarms were clearing we received a call to indicate that a rectifier blew. Thankfully there was a spare rectifier which used as a replacement. Replacing the rectifier clear that alarm. After failing to power cycle the servers we realized that the SFP (Single Fiber Port) needed to be re-seated in the router. Once this was done, we were able to restore the servers. Unfortunately, the province was not equipped with a spare SFP so in order to temporarily resolve to restore the data router while parts are being shipped in we transferred the traffic to a protective card.
What we learned
We learned that more visibility is needed on alarms when dealing with high temp and power loss. Unfortunately, there wasn’t visibility on all equipment. The top priority is to restore utility power to be able to get temperatures down which will then allow batteries to charge. Once this is done, you can begin the work to troubleshoot failed equipment.
Our Best Advice
Don’t get stressed
Work way better and clear-headed when not stressed
Dispatch does not always have full visibly on the emergency, use your judgment, and trust your own experience.
Ask lots of questions, repeat tasks back to dispatch, talk through everything you’re doing.
While this was a high-pressure situation, it gave us a really good learning experience as we got to work with different equipment and alarms.