Zombie nation: When idle virtual servers come back with a vengeance

One of my first jobs in the IT world saw me at the forefront of promoting virtualisation to an army of IT guys who probably still owned copies of the Encyclopedia Britannica.

Much like the onslaught of the industrial revolution, these cloth-making, felt-hat wearing luddites needed to accept that change was upon us, and there was nothing they could do to stop it.

As part of being an ambassador for virtualisation, I’d been tasked with installing ESX 3 for the first time. As you’ll know, it’s now called ESXi and is deployed onto your physical server to help maintain the virtual infrastructure. As expected, this was met with opposition from the old timers.

Two years later, we’d consolidated down to half our space and dropped the empty racks from our cage lease. It was about then that I learned about the problem of virtualisation sprawl. At first, I thought VM sprawl sounded like a powerful green moss. However, we were to find out the hard way that this referred to a situation whereby a number of virtual machines, for whatever reason, cannot be effectively managed.

Naively, I’d assumed that, as with physical boxes, the admins would delete old virtual servers when they were no longer serving a purpose. What I didn’t know was that these guys figured so long as they had storage, they’d just shut them down and leave them on the disk, dormant.

Unbeknown to me, this went on for months. It wasn’t until we had a storm which brought about a 30 minute power cut to our uninterrupted power supply that these issues rose to the surface.

When everything did eventually begin to work again, there was utter chaos as lots of previously undead VMs, asleep but set to auto-start, woke up simultaneously, with conflicting app connections to databases and duplicate IPs.

This isn’t something we should have to put up with in 2015. Fortunately for the IT Pro today, there are a number of tools we can put in place to stop this from happening. In addition to capacity planning, which is vitally important, running regular inventory reports is a great way to get visibility of just what is going on. Generating a detailed inventory of all VMs can show amongst other things, RAM, Disk and CPU for each machine.

A virtualisation manager tool can also look at the amount of idle, stale and zombie VMs inside clusters. It can identify exactly the amount of resource they are using, and further drill down to those zombies with low CPU and memory usage, with not a lot of network throughput, or IOPS going through it.

With this level of foresight, it’s made it ‘virtually’ impossible for me to be yelled at for other people’s mistakes, for which I am eternally grateful.

Kent Row is an IT Admin and Superhero at SolarWinds.