Interview: Ian Masters from Vision Solutions on the demise of 2e2, the cloud and disaster recovery

We interviewed Ian Masters, Sales Director Northern Europe at Data Management solution provider, Vision Solutions, to find out more about the company's plans, some pitfalls associated with moving platforms, the 2e2 episode as well as using cloud computing for disaster recovery.

What are some of the unforeseen problems of moving your backend IT to a new platform?

It's nice making one of those big strategic decisions – you can see the return on investment models, you have set out and managed potential risk around suppliers, everything is due to come together nicely. Then you look at the actual migration schedule involved and suddenly things don't look so rosy. Aside from any capital investment and set-up of new hardware and software, you also have to factor in how long it can take to move from the old platforms to the new platforms.

Even when you are virtualised, there can be some significant downtime challenges that can arise. In many cases, it's not as simple as just turning off one set of kit and powering on another.

One of the biggest problems around migration is the sheer time lag that it can take to get new systems imaged and up-to-date with existing workloads. The traditional approach is to snapshot all the current workloads and then move those images over. However, during that moving-over process, you can't create any new data or the snapshots would be out-of-date. This process can take days to complete – even if you have staff doing this out-of-hours, there is still an impact on both business service and also the overall cost of migration.

An alternative here is to look at replication technologies instead. The system imaging process is the same, but while a migration is taking place the systems being imaged can stay up on-line. Replication captures any changes to those live systems and queues them up to be applied to the new systems when they are ready to be implemented. This cuts the overall migration and downtime window significantly and helps to keep the costs of a migration down as well.

The other big challenge for migrations is moving from one set of systems to another. This can involve moving from one hypervisor to a new one, or from one storage provider to a different option. Avoiding cost and complexity during a migration like this means planning ahead and understanding all the moving parts that are involved.

With the demise of 2e2 earlier this year, is cloud a viable and trustworthy option for businesses to seriously invest in?

While what took place at 2e2 was very challenging for all the individuals and businesses involved, I actually think it was good for the cloud industry as a whole that an issue like this came up this year. It encouraged IT managers that were seriously considering a move to the cloud to rethink what they were doing. Selecting a cloud provider is a big decision in terms of the technology that underpins the platform that you are looking at, but it also requires more business-level analysis as well. You can have the best run cloud technology in place, but if you can't answer fundamental questions around business continuity planning, succession planning or service levels and procedures in the event of something going wrong, then you're not meeting the needs of your customers.

In the months after the 2e2 situation unfolded, I have spoken to a number of cloud service providers about their own DR strategies and where they see take-up of their offerings going – one comment that came out was around how much they are working together to both re-establish some of the trust in the cloud model and also, at a very practical level, around protecting each others' customer data as well. This tells me that these companies are serious about cloud and the value that it can provide, as their own business continuity planning and support strategies are being entrusted to other cloud providers too.

Why is cloud an interesting option for disaster recovery?

The whole premise for cloud is that you don't have to own the infrastructure in order to benefit from a service. This overcomes one of the biggest challenges for companies around DR, in that it is often perceived as a 'nice to have' for insurance policies rather than a must have.

Going down the cloud route for DR gets around this as companies can take a different approach to how they protect their data. Rather than investing upfront in something that you hope never to use, you can spread payments over the length of time that you use the service. If you do need to invoke your DR strategy, then the cloud can host those critical servers for you and employees can continue working while any issue is dealt with.

There is one important point to remember about shifting your DR strategy over to the cloud: how you do this can depend on the IT platform choices that you have made, and those of the potential cloud service providers that you might evaluate. Some cloud services would essentially be extensions to your existing virtualisation or private cloud strategy: fine if you are on the same virtualisation platform as your service provider, but not suitable if you have a mixed estate or different hypervisor in place.

Understanding how your data gets into the cloud securely and then how it gets used as part of a recovery plan is essential to make sure that it is effective. Looking at the mechanics of recovery can help avoid downtime that is unnecessary.

Is downtime really that much of a problem to businesses?

I'd argue that it is a real problem. The pain and cost that can be experienced during a problem is real, and the sheer amount of work that is required to get back to a "known good" state is not something to be dismissed out of hand.

Downtime during a disaster has multiple impacts: there is the amount of time taken to recover, there is the cost of restoring data lost, there is the cost of the staff time and effort that is put into a recovery task that would otherwise be spent on creating business value, and there is the potential for lost business as clients turn to competitors.

However, I'd also argue that looking at how to make your business more resilient against failure can be an opportunity to rethink some of your existing processes and try to streamline them. For example, using options like cloud computing can provide better services at lower cost to fill some of the IT requirements that a company may have, rather than spending much-needed capital. Similarly, looking at cloud as a way to test that data is being protected and that the recovery process is working should not just be viewed as an expense: a fail-over test can be used as a way to keep systems up and running for front-line staff while necessary updates or other changes are carried out.

How will IT infrastructure fare with the increasingly diverse array of technology in use?

Diversity should be welcomed. If you commit to only one approach or only one vendor, then you are tied to that approach going forward. If your business situation changes, then it gets more difficult to change.

Look at storage. Many organisations commit themselves to full storage refresh projects every three or four years, even when that storage hardware can still have many years' worth of useful life left in it. Re-using storage for DR purposes is a great way to save on costs, and keep your options open when it comes to refresh projects.

The same is true with cloud: most organisations will be taking a hybrid approach to cloud computing as they look to make the most of what is available to them. The main point is that the cloud market is evolving at a rapid pace, and everyone is conscious of making decisions that would see them over-commit to a particular platform and not be able to get back out again.

What will be interesting to track is how IT systems increasingly move into software: we have had this with server virtualisation and storage, and it is now taking place at the network level. There are still some platforms where everything from the data, apps and operating system through to the hardware remains tightly coupled together – take the mainframe as an example – but increasingly all the requirements for power and opportunities for innovation are moving into the software level.

What is the difference in the available technology options, like snapshotting and replication, and how do they affect the IT infrastructure?

Snapshotting involved taking pictures of your data at regular intervals, and then sending those over to a secondary location. If not much changes between snapshots, then the overhead is low and the time to recovery should be fairly low as well.

However, snapshots are based on having the same platforms at each end – so compatible storage, servers and / or the same hypervisor at the secondary site as the production environment. Even in DR environments where you are taking many different production server images and pointing them to a single recovery server and storage, the same hypervisor typically has to be used on both ends.

Replication takes a different approach, as it concentrates on what is happening at the data and application level rather than at the storage or hypervisor level. By capturing all the changes that are taking place and moving them over to the second site in real time, you can have exact replicas of the data without having to have those identical systems in place.

The main benefit of this is that you can move data from one system to another and still keep it protected. You could have a production server sitting on an EMC SAN and VMware hypervisor that is then replicating data over to a secondary system that is based on Microsoft Hyper-V using a NAS or direct-attached storage: the data would still be protected and available.