The Unnoticed Threat: Network Alarms and the Overlooked Role of Timing

In a January 2024 conversation with Rob Jodrie, Solutions Architect at Syncworks, Mr. Jodrie shed light on the often underappreciated realm of network alarms and the peculiar reality of why they don’t receive the attention they deserve. Alarms signify potential network failures, a critical aspect that historically tends to be overlooked. Despite new technology, the issues persists today.

Network engineers now have tools that make this oversight less likely to demand an entry in the FCC Network Outage Reporting System. However; there is still a threat looming over many network operators. Aging timing equipment is the most vulnerable weak link most likely to send out an alarm. Their contact closure-only method of alarming renders them a danger. They are called “idiot alarm.”

Despite the FCC NORS stigma and the unexpected costs associated with getting the network back to Stratum 1, network alarms continue to be overlooks. A back study that Mr. Jodrie participated in showed that “alarms and power were the two main causes found in the required NORS reports. With all this evidence pointing to a simple solution, there’s still no real good answer as to why they keep going unnoticed.” Mr. Jodrie said. “It’s an old story. It goes like this: “Pay attention to your timing alarms.”

FCC Network Outage Reporting System (NORS)

What is the Network Outage Reporting System (NORS)?

In 2004, the FCC implemented outage reporting rules to ensure swift, comprehensive, and accurate information about significant communication service disruptions with potential impacts on homeland security, public health, safety, and the nation’s economic well-being. Communication providers, including wireline, cable, satellite, wireless, interconnected VoIP, and Signaling System 7 providers, must adhere to these rules. Reportable network outages lasting at least 30 minutes trigger mandatory reporting in the Commission’s Network Outage Reporting System (NORS). Data submitted to NORS is considered confidential.

Depending on the provider type, notifications must be made within specific timeframes: preliminary information within 120 minutes, an initial outage report within three calendar days, and a final report within 30 days of outage discovery. Interconnected VoIP providers follow a similar process, with variations based on the outage’s impact. Covered 911 service providers, responsible for aggregating and delivering 911 traffic, have specific notification timelines and information-sharing requirements when an outage affecting a 911 call center occurs. Source

NORS and the Odd Reality of Timing

Rob, reflecting on his Tier 2 Technical Support career, highlighted the historical neglect of timing in network operations. He shared insights into the FCC reportable events, where outages of certain size, capacity, or duration required reporting to the FCC, often resulting in fines and root cause analysis. Surprisingly, a back study revealed that power and timing were the most common culprits behind these expensive events.
Rob Jodrie:

We used to have to do write ups on what they call FCC reportable events and those were, I forget the exact specifics, but if you had an outage of either a certain size, capacity, bandwidth was down or of a certain duration, length of time or both, you’d have to some in some cases report that to the FCC and it was not a good thing. Fines would come about and all this and they would ask for root cause analysis. Meaning you got to tell me what was the thing that caused this issue, you know. And then you were supposed to, you know, get a plan to say here’s how we addressed it.

But point is we did kind of a back study on that at one point in time and we found that the two most common issues causing FCC reportables were power and timing.

So it’s really interesting to me that you know these FCC reportables were not small events and they were very, very expensive events and it just has always surprised me how little attention that it gets.

Telecom Solutions and BITS Clocks
The conversation turned towards telecom solution BITS clocks, workhorses that have been running for decades. Dave reminisced about the discreet alarms in the early days, simple contact closures that provided minimal information. Even today, thousands of these systems are active in the U.S., relying on basic alarming methods that necessitate physical inspection to determine the issue.

Rob Jodrie:

The interesting thing is in the early days and this is still out here with these in today’s market, these telecom solution bits, clocks which are workhorses, they’ve been running for literally decades. We put our first one in Portland, ME in 1987, I think. But they have what they call discreet alarms, which means it’s a contact closure and all it does is sends a relay event, if you will, to what they call a scan point.

And it just detects is there a short across the line or is there an open and if there’s a short across the line, the words come up and it says “sync major”, “sync minor”, “sync critical.” That’s it. That’s all you know, you don’t know if you’re in hold over, if the whole thing is down, you have zero idea. It’s just a dumb alarm and you would see that alarm and you would have to get a pair of hands out there to stand in front of the box and look at the system and determine based on lights what do we got what’s going on. So get kind of kind of interesting and that is still the case the the next generation of gear thankfully you know is is intelligent right we can log on to it and we can retrieve information from it. We can find out is it in holdover do I have references you know power trouble what’s the story without rolling a truck.

Evolution of Alarm Systems
Despite the persistence of legacy systems, the next generation of gear brings intelligence to the forefront. With the ability to log in remotely, retrieve information, and diagnose issues without dispatching a truck, modern systems have come a long way. However, the conversation acknowledged that the vast number of existing systems that still rely on primitive alarming methods due to aging telecom timing equipment still in use. Specifically the Telecom Solution DCD.

Rob Jodrie:
So the NORS and alarms have come a long way but there are thousands and thousands of those Telecom Solution systems still out and active in networks in the United States. It’s hard to hard to believe but that’s what it is and and sometimes we would actually find that the alarming that piece of wire that would go from the bits clock to the scan point that would report back to the NOC center.

Sometimes that wire would get ripped out and there was no alarming. So we’d walk into an office look at the BITS clock see it’s an alarm call the NOC and say “why did you not respond to the alarm?” “We don’t see any alarm.” The lack of love for timing has been sort of a an interesting thing to note over over these years.

As time marches on, I have seen people becoming more aware of alarms and of the FCC Network Outage Reporting System at times but for the most part it usually takes a bit big time with a service-affecting outage to get their attention.

Only then is there a lot of attention given to it.

How Alarms Manifest Themselves Today

Rob Jodrie:
One of the most common alarms in use today is something called SNMP. It stands for Simple Network Monitoring Protocol and it’s an intelligent thing. What you do is you have an SNMP server located somewhere centrally and you have to point this intelligent equipment via an IP address scheme back to it. So when you have a device that’s intelligent and some problem comes in where the network has lost their GPS signal, it then sends an intelligent piece of information back to this SNMP server. Then that translates it and comes up and gives actionable and almost pinpoint information along the lines of “this piece of equipment at this location has this issue and now somebody needs to respond to that.” With the SNMP you have more to go on rather than sync major red light.

Now issues when someone calls our SyncCare support line and says “I’m in holdover, I’ve lost GPS.” those are noted in the alarm logs so you can go back and say when did this happen? Has this been a week, two weeks? What events were there? There’s just more information you can collect, but it’s all done usually via SNMP.

Rob Jodrie

Rob Jodrie

Technical Support Engineer, Syncworks

Rob started working in the Telecommunications Industry with the Bell System in 1982. He had responsibility for Tier II Network Synchronization and Transport Technical Support at Verizon for fourteen years and has been working at Syncworks since 2015.