What have we learned from NatWest's banking blunder?

A NatWest software glitch on 19 June left up to 12 million NatWest, Royal Bank of Scotland and Ulster Bank customers without access to expected payments and account balances. A week after the error, some customers, particularly those at Ulster Bank, are still unable to access funds that are rightfully theirs, racking up overdraft fees, late charges and, for some, irrevocable inconveniences.

RBS has said that, due to “significant stress,” its banks will not return to “completely normal service” before 2 July.

Though the crisis is thought to be the farthest-reaching and longest-lasting in UK online banking history, it isn’t the first time IT errors have led to outages within financial institutions’ systems, with those outages directly affecting customers. And unless NatWest and other major companies learn a lesson and implement changes accordingly, it won’t be the last time.

Last November, NatWest and RBS customers were faced with a similar failure in online and mobile banking, when a technical glitch prevented payments from going through and balances from being updated. During the same month, HSBC customers were locked out of cashpoint machines and online banking, also thanks to an IT error.

NatWest and its parent company RBS have yet to explain exactly what happened last week. But some sources, speaking to the Guardian and other publications, pointed to an error in the CA-7 automated batch processing software used by NatWest to manage retail banking transactions. An attempt to update the software failed, leading to a backlog of transactions that had to be re-implemented, in turn setting off a domino effect that left a still-unspecified number of people cut off from their accounts.

Bank of England governor Mervyn King has said the Financial Services Authority should investigate the outage.

“Once the current difficulties are over, then we will need the FSA to go in and carry out a very detailed investigation to find out firstly what went wrong, and then perhaps even more importantly why it took so long to recover,” King said, speaking to Parliament on 26 June. “Computer systems will always go wrong from time to time. The important things are…backup systems and the time it takes to implement recovery.”

Speaking to Channel 4, RBS CEO Stephen Hester made a similar comment. “Technology breaks down; it’s how you react to it [that matters],” he said.

The RBS boss’ statement to Channel 4 sums up the essence of the problem - that there were two errors: the initial software update blunder and then, more disastrously, the failure in adequately addressing it. But, Hester’s mistake, and one which reflects the broader issue affecting large institutions like NatWest, is in thinking that the reaction to the original glitch went “the way that [he] would have hoped.”

To begin with, stricter and more effective strategies should have been in place to prevent the original error. Companies functioning on as large a scale as NatWest - particularly when consumer finances and personal data are at risk - should have IT solutions that prioritise extensive testing before rolling out new software or updates to existing systems.

“This was not inevitable – you can always avoid problems like this if you test sufficiently,” David Silverston, a delivery and solutions manager for NMQA, a firm that delivers automated testing software to banks, told the Guardian. “But unless you keep an army of people who know exactly how the system works, there may be problems maintaining it.”

Of course, even with testing, it’s possible for technology to break down. But for those rare situations, NatWest and other large institutions should have far-reaching backup mechanisms that go into effect immediately. Though it may be some time before we understand the full extent of the error from a technological standpoint, it’s apparent that any back-up and recovery systems in place were slow-functioning and/or inadequate.

Silverstone also points to a problem that others, including the Unite union, are blaming for the error - a mass layoff that led UK IT support staff to be replaced with CA-7 specialists based in India. A number of sources have blamed a shortage in UK IT staff for the incredibly slow, and ineffective, reaction to the original glitch. The fewer capable staff on hand, the more complex, disjointed and weak any recovery implementation will be.

But the implications of outsourcing IT positions overseas aren’t simply geographical in nature - in corporations of all types, ‘offshoring’ points to a trend of cutting corners and trying to reduce costs in IT, despite the potentially disastrous effects. As we’ve seen with this particular outage, a company’s business continuity is only as strong as its weakest IT link.

Banking Technology editor David Bannister told Channel 4 that NatWest’s system - and presumably others like it - are decades-old and have been modified over time, with updates being shoddily tacked on when entire systems should actually have been replaced.

Decisions like outsourcing crucial positions to overseas staff and relying on aging technologies suggest that IT solutions are under-prioritised by the likes of NatWest and RBS. There can often be a disconnect between decision-makers and people with actual, implementable knowledge of the realities of IT, resulting in such problems as staff shortages and inadequate systems that may be groaning under the weight of the demands placed on them.

If this latest crisis has proven anything, it’s that IT is not something that should ever be compromised. In order for NatWest, and other companies with similar circumstances, to prevent another instance of this altogether avoidable breakdown, they must pledge to re-examine and thoroughly address the weaknesses of their IT solutions.