If recent economic conditions have demonstrated one thing, it is that uncertainty of any kind is bad for business. The worst situation for any organisation is one that takes them by surprise. System downtime is one of those things that can happen unexpectedly, often with little to no forewarning.
For many, system downtime can lead to very expensive complications. Service disruption can result in lost business, data loss and reputational damage. For the Fortune 1000, just a single hour of infrastructure failure costs on average $100,000 (approximately £82,000), with the average cost per year sitting between $1.25bn and $2.5bn.
For many, the greatest problem posed by downtime is not the short-term financial damage, but the long-term reputational damage, which is difficult to quantify but nevertheless significant. We saw very recently an IT system crash at Leeds Teaching Hospitals, forcing management to postpone 113 patient operations. Despite the initial operational issues posed by the system failure, it will be hard for the hospital to recover from the lack of faith instilled in patients as a result of this failure.
The ability to resolve downtime issues as fast as possible is essential. This must be supported by efforts to limit the chances of it happening in the first place. Taking a two-pronged approach will help to ensure that shock and the cost of downtime are minimised.
Hitting the Limit
Understanding how your organisation operates and moves data is a key component to limiting downtime. The cost of data loss can be reduced by identifying areas where the most protection is required, so start by looking at patterns. Critical questions to consider include:
- Where is your sensitive data stored and how safe is it?
- Where are the greatest network vulnerabilities?
- Which areas of the organisation are the most important to keep up and running?
- Are there specific systems or applications that your business is most reliant on?
Taking this proactive approach will help you better protect the core tools and data from any planned or unplanned downtime.
Often the most frustrating unplanned downtime scenario is a system crash, which is what happens when a critical supplier’s servers go down. Little can be done to influence or resolve the situation from a customer perspective. As a result, ensuring service-level agreements with all key vendors and partners your organisation is working with is essential.
Any quality enterprise IT product will come with an agreed level of availability, but make sure you’re aware of how availability works. An uptime of 99 per cent for an SLA still allows for more than seven hours of downtime in just 30 days. If the contracted amount of allowable downtime doesn’t meet business leads, or if availability is not high enough, changes should be made.
Tools able to minimise the effects of downtime are available. Managed file transfer solutions with features designed to provide high availability can help minimise downtime when systems are taken offline for maintenance, or during an unexpected outage, ensuring mission-critical operations can keep functioning.
Reducing Risk and Planning for Potential Downtime
Ultimately, trying to prevent any downtime from occurring is a very difficult, ongoing challenge. Even in the most well-maintained IT systems, things can still go wrong. Think about how your IT networks operate, and whether the failure of any critical system would result in widespread downtime? It is almost impossible – definitely cost prohibitive – to guarantee that every single piece of software and hardware is going to keep working at all times.
Anticipating issues and planning how to address those issues is one key way to reduce the risk of downtime. Another way to reduce the risk of downtime is to provide an environment where availability can be maintained.
Introducing active-active clustering is a positive first step. It works by balancing workloads across different servers, which in turn limits the risk of downtime by making sure that hardware and technology doesn’t get overloaded. There are some organisations also using active-passive clusters, but these only provide failover systems in the event of an issue. In some cases, this may not prevent downtime, as Delta Airlines experienced recently when its systems didn’t come back up properly during an outage which prompted a failover.
Building an infrastructure that can adapt quickly and scales easily will pay dividends, as it will help keep your organisation ahead of its own growth. Even when downtime is planned, it can still be a hindrance to productivity and costly to an organisation. Therefore, it is critical for you to ensure that infrastructures within your organisation are firstly balanced to cope with demand, and secondly easily scalable with business needs, limiting planned and unplanned downtime.
Prevention of downtime therefore needs to take a horizontal approach: scaling when required and, in turn, maximising uptime. Using technology solutions which manage demand across servers can support uptime by regularly redistributing transactions, depending on requirements and changes. A quality load balancing tool will enable coordination of workflows and full synchronisation between nodes. At the same time, it should be relatively simple to install, configure and manage.
Identifying A Solution
Identifying potential vulnerabilities and getting assistance or support can be more complicated if your organisation uses multiple vendors for one or a few similar services. Using the same vendor can really help simplify the process. Scenarios where systems and workloads across the organisation will be compatible, not competing, and vulnerabilities can be identified or addressed early will result in simpler networks that are easier to manage in a crisis.
In short, downtime is always going to be an issue that needs to be managed and planned for, but steps can be taken to limit its effects across the organisation or with customers or partners.
Greg Hoffer is Vice President of Engineering at Globalscape
Image source: Shutterstock/hafakot