The Unnoticed Threat: Network Alarms and the Overlooked Role of Timing
In a January 2024 conversation with Rob Jodrie, Technical Support Director at Syncworks, Mr. Jodrie shed light on the often underappreciated realm of network alarms and the peculiar reality of why they don’t receive the attention they deserve. Alarms signify potential network failures, a critical aspect that historically tends to be overlooked. Despite new technology, the issues persists today.
Network engineers now have tools that make this oversight less likely to demand an entry in the FCC Network Outage Reporting System. However; there is still a threat looming over many network operators. Aging timing equipment is the most vulnerable weak link most likely to send out an alarm. Their contact closure-only method of alarming renders them a danger. They are called “idiot alarm.” Mr. Jodrie did not specify if idiot referred to the user or to the equipment.
Despite the FCC NORS stigma and the unexpected costs associated with getting the network back to Stratum 1, network alarms continue to be overlooks. A back study that Mr. Jodrie participated in showed that “alarms and power were the two main causes found in the required NORS reports. With all this evidence pointing to a simple solution, there’s still no real good answer as to why they keep going unnoticed.” Mr. Jodrie said. “It’s an old story. It goes like this: “Pay attention to your timing alarms.”
What Equipment Is Most At Risk for a Network Outage Reporting System Event?
Enjoying phenomenal success from its introduction, the Telecom Solutions DCD 523 Digital Clock Distributor Network Synchronization System (w/ Dual Rubidium Oscillators) is still in operation today. Despite an amazing run in NOCs all across the USA and the world, time is running out for this clock’s dual Rubidium oscillators. Nothing lasts forever and when end of life comes for these warriors the only sound it will make is a silent alarm. This will send the clock into holdover. Whether anyone is there to see it before it the holdover starts to fail and affects service is what we at Syncworks would like to ensure. Otherwise the FCC and a NORS will be the next call they make after us.
What is NORS?
In 2004, the FCC implemented outage reporting rules to ensure swift, comprehensive, and accurate information about significant communication service disruptions with potential impacts on homeland security, public health, safety, and the nation’s economic well-being. Communication providers, including wireline, cable, satellite, wireless, interconnected VoIP, and Signaling System 7 providers, must adhere to these rules. Reportable network outages lasting at least 30 minutes trigger mandatory reporting in the Commission’s Network Outage Reporting System (NORS). Data submitted to NORS is considered confidential.
Depending on the provider type, notifications must be made within specific timeframes: preliminary information within 120 minutes, an initial outage report within three calendar days, and a final report within 30 days of outage discovery. Interconnected VoIP providers follow a similar process, with variations based on the outage’s impact. Covered 911 service providers, responsible for aggregating and delivering 911 traffic, have specific notification timelines and information-sharing requirements when an outage affecting a 911 call center occurs. Source
Rob Jodrie – Technical Support Director at Syncworks
The FCC Network Outage Reporting System and the Odd Reality of Timing
Rob, reflecting on his Tier 2 Technical Support career, highlighted the historical neglect of timing in network operations. He shared insights into the FCC reportable events, where outages of certain size, capacity, or duration required reporting to the FCC, often resulting in fines and root cause analysis. Surprisingly, a back study revealed that power and timing were the most common culprits behind these expensive events.
We used to have to do write ups on what they call FCC reportable events and those were, I forget the exact specifics, but if you had an outage of either a certain size, capacity, bandwidth was down or of a certain duration, length of time or both, you’d have to some in some cases report that to the FCC and it was not a good thing. Fines would come about and all this and they would ask for root cause analysis. Meaning you got to tell me what was the thing that caused this issue, you know. And then you were supposed to, you know, get a plan to say here’s how we addressed it.
But point is we did kind of a back study on that at one point in time and we found that the two most common issues causing FCC reportables were power and timing.
So it’s really interesting to me that you know these FCC reportables were not small events and they were very, very expensive events and it just has always surprised me how little attention that it gets.
Telecom Solutions and BITS Clocks
The conversation turned towards telecom solution BITS clocks, workhorses that have been running for decades. Dave reminisced about the discreet alarms in the early days, simple contact closures that provided minimal information. Even today, thousands of these systems are active in the U.S., relying on basic alarming methods that necessitate physical inspection to determine the issue.
The interesting thing is in the early days and this is still out here with these in today’s market, these telecom solution bits, clocks which are workhorses, they’ve been running for literally decades. We put our first one in Portland, ME in 1987, I think. But they have what they call discreet alarms, which means it’s a contact closure and all it does is sends a relay event, if you will, to what they call a scan point.
And it just detects is there a short across the line or is there an open and if there’s a short across the line, the words come up and it says “sync major”, “sync minor”, “sync critical.” That’s it. That’s all you know, you don’t know if you’re in hold over, if the whole thing is down, you have zero idea. It’s just a dumb alarm and you would see that alarm and you would have to get a pair of hands out there to stand in front of the box and look at the system and determine based on lights what do we got what’s going on. So get kind of kind of interesting and that is still the case the the next generation of gear thankfully you know is is intelligent right we can log on to it and we can retrieve information from it. We can find out is it in holdover do I have references you know power trouble what’s the story without rolling a truck.
Evolution of Alarm Systems
Despite the persistence of legacy systems, the next generation of gear brings intelligence to the forefront. With the ability to log in remotely, retrieve information, and diagnose issues without dispatching a truck, modern systems have come a long way. However, the conversation acknowledged that the vast number of existing systems that still rely on primitive alarming methods due to aging telecom timing equipment still in use. Specifically the Telecom Solution DCD.
So the NORS and alarms have come a long way but there are thousands and thousands of those Telecom Solution systems still out and active in networks in the United States. It’s hard to hard to believe but that’s what it is and and sometimes we would actually find that the alarming that piece of wire that would go from the bits clock to the scan point that would report back to the NOC center.
Sometimes that wire would get ripped out and there was no alarming. So we’d walk into an office look at the BITS clock see it’s an alarm call the NOC and say “why did you not respond to the alarm?” “We don’t see any alarm.” The lack of love for timing has been sort of a an interesting thing to note over over these years.
As time marches on, I have seen people becoming more aware of alarms and of the FCC Network Outage Reporting System at times but for the most part it usually takes a bit big time with a service-affecting outage to get their attention.
Only then is there a lot of attention given to it.
How Alarms Manifest Themselves Today
one of the most common alarms in use today is something called SNMP. It stands for Simple Network Monitoring Protocol and it’s an intelligent thing. What you do is you have an SNMP server located somewhere centrally and you have to point this intelligent equipment via an IP address scheme back to it. So when you have a device that’s intelligent and some problem comes in where the network has lost their GPS signal, it then sends an intelligent piece of information back to this SNMP server. Then that translates it and comes up and gives actionable and almost pinpoint information along the lines of “this piece of equipment at this location has this issue and now somebody needs to respond to that.” With the SNMP you have more to go on rather than sync major red light.
Now issues when someone calls our SyncCare support line and says “I’m in holdover, I’ve lost GPS.” those are noted in the alarm logs so you can go back and say when did this happen? Has this been a week, two weeks? What events were there? There’s just more information you can collect, but it’s all done usually via SNMP.
Beyond SNMP: Possible Solutions to Missed Alarms Resulting in FCC NORS
From the Mr. Jodrie’s comments, the older Telecom Solutions DCD equipment won’t have the luxury of SNMP. On top of that several possible solutions for making network technicians more aware of outage alarms include:
1. Enhanced Remote Monitoring and Logging:
– Establish remote monitoring capabilities to allow technicians to log in from a central location and retrieve information without the need for on-site visits.
– Utilize advanced logging mechanisms that record the history of alarms, enabling technicians to review past events and identify patterns.
2. Training and Awareness Programs:
– Conduct regular training programs for network technicians to enhance their understanding of alarm systems and the critical role of timing in network operations.
– Increase awareness about the potential impact of timing issues through educational initiatives, emphasizing the financial and operational consequences of overlooked alarms.
3. Automated Notifications:
– Implement automated notification systems that promptly alert technicians when an alarm occurs, reducing response time and ensuring that issues are addressed promptly.
– Utilize modern communication channels, such as instant messaging or mobile apps, to deliver real-time notifications to technicians.
4. Data Analysis and Predictive Maintenance:
– Employ data analysis tools to identify patterns and trends in alarm occurrences, allowing for proactive measures to prevent potential issues.
– Implement predictive maintenance strategies to anticipate and address potential timing-related problems before they lead to outages.
5. Collaborative Platforms:
– Foster collaboration among technicians through centralized platforms where they can share insights, discuss challenges, and collectively address alarm-related issues.
– Encourage a culture of knowledge-sharing within the network operations team to enhance overall awareness and expertise.
6. Regular Audits and Assessments:
– Conduct regular audits of alarm systems and equipment to ensure that they are functioning optimally and are aligned with industry best practices.
– Perform assessments to identify any gaps in technician awareness or training and address them proactively.
By implementing a combination of these solutions, network technicians can become more aware of outage alarms, leading to improved response times, better problem resolution, and a proactive approach to maintaining network stability.