Everyone’s talking about downtime

Last week, BT suffered a nation-wide network outage that affected tens of thousands of its broadband and phone customers. The outage lasted for two hours, with failures reported across the UK, hitting business hubs such as London, Coventry, Sheffield and Glasgow.

It was critical for BT to respond quickly as the duration of a downtime is ultimately a reflection of a company’s reliability to its customers. However, the failure drew widespread criticism online and within the media, with the conversation continuing over the days that followed.

While news outlets were quick to speculate on the cause of the outage, BT confirmed that the problem had originated from one faulty router.

But this isn’t the first time an outage has hit the headlines and given the reaction from customers and the wider media, it is clear that companies need to consider the reputational and financial repercussions of such failures.

From the moment a network outage is reported, to the moment network connectivity is restored, operators are fighting against time. Each passing minute plays a part in how customers will react. If an organisation manages to swiftly identify and rectify a problem, it is widely praised by customers and commentators alike. However, the longer the outage, the worse the backlash is.

BT’s outage also highlights how completely reliant companies now are on digital networks and infrastructures. Avoiding and handling these challenges are becoming a daily battle for network managers as their networks are tasked with rapidly processing larger sets of data than ever before. Ultimately, BT’s hour of shame brings to the forefront a broader trend where networks need to deal with a tsunami of data from multiple streams. Clearly some are struggling to do so.

There are, however, steps that network managers can take that will enable them to swiftly identify and shut down faulty routers.

  • Ideally, a carrier such as BT should at least have multiple levels of redundancy – that way, if a router was to fail, it should not trigger a network failure for an extended period of time
  • Operators should be able to identify a problem device within minutes and remove it from the network, but often the cause of delays is not the technology itself, but identifying network issues that lie with legacy monitoring and management systems

What is clear from this outage is that companies that rely on providing digital services need to invest in robust network management systems. This will enable them to act efficiently and cost-effectively – any system worth its weight in gold should be able to scale at speed.

But while it is easy to be intimidated by the headlines, network managers need to see this outage as an opportunity. They need to take heed and analyse their own infrastructure and monitoring systems, ensuring they are delivering the most reliable services to their loyal customers.

Tom Griffin, Managing Director, EMEA, SevOne

Image Credit: Korn / Shutterstock