Skip to main content

Right-size your multi-cloud workloads using AI but don’t stop there

(Image credit: Image source: Shutterstock/everything possible)

The only thing better than knowing what your IT stakeholders are doing right now is knowing what they're going to do before they do it. Much like the adoption of autonomous vehicles promises to transform how we deal with traffic, the use of predictive analytics in IT is fundamentally changing the way businesses operate.

One industry that has embraced embedded AIOps is hybrid IT as a mechanism for efficient provisioning of cloud workloads.

 The difficulty of calculating cloud total cost of ownership 

One key bottom-line calculation for where to run a given workload is total cost of ownership (TCO) of running on-premises vs. in a hybrid or multi-cloud scenario. It’s easy to end up with an apples-to-oranges comparison of operational expenses (hardware/software/maintenance) vs. capital expenses (cloud services for compute, storage, and other resources, plus app rebuild and migration costs).

Miscalculating TCO can cause companies to spend more than they expect on cloud migrations, services, and maintenance. The first step in determining cloud TCO is assessing the workload requirements of your applications in three categories:

  • Predictability of the workload's demands for cloud resources, including migration costs and recurring costs
  • Flexibility of a particular service to accommodate the application's unique dependencies, or whether several services should be combined to meet the application's resource needs
  • Control to ensure you have a clear and complete view of the application's use and operation, including being able to adjust quickly to changing conditions and user requirements

Being able to assign and track costs to specific users, applications, and clouds is one piece of the puzzle that a modern cloud management platform (CMP) can help resolve. Gathering this data in a centralised way is a foundational element for rightsizing. It’s also critical that your CMP is able to gather data for brownfield environments. It’s easy to apply rightsizing principles to new workloads but the real cost savings comes from gaining visibility of existing infrastructure and shadow IT instances both on-prem and off.

Signs of wasted workload resources

The only way to keep your applications running at peak price/performance is to determine the optimum instance types and sizes for your workloads. Three measures that indicate a problem related to instances are:

  • Long runtimes at low utilisation
  • Failure to accommodate usage spikes
  • Inability to scale instances on-demand 

Cloud usage reports help you address workload inefficiencies by determining more precisely the memory, virtual CPU cores, storage or other resources your workloads need.  However, reporting is historically a passive way of dealing with the problem. The analytics built into your CMP along CMP integration into load balancer can enable constant optimisation and dynamic scaling.

The general rule is that selecting larger instances will reduce your total instance count, which translates into lower costs. As with most general rules, there are a great number of exceptions. For bursty workloads, long-term instances such as Amazon EC2 Reserve Instances may be best for the baseline load, while larger instance types are used for bursts, and instances from the spot market are applied to other peak loads.

AWS claims its Spot Instances, which are bid on from unused EC2 capacity, can save customers as much as 90 per cent compared to the cost of On-Demand Instances. Similarly, Google Preemptible VMs cost up to 70 per cent less than the company's standard instances, although the instances terminate after 24 hours or when the resources are required for other tasks.

One challenge faced in a multi-cloud world is the fact that every cloud vendor has its own approach to reporting and to top it off existing on-premises infrastructure and private clouds are often the most cluttered. Consolidating cloud management in a unified view across hybrid and multi-cloud environments can help normalise reporting plus enable more accurate chargeback to internal users. When coupled with guided recommendations and migration tools, consolidation can cut costs by tens of thousands of dollars per month.

The other common provisioning mistake that results in higher cost is failing to understand how applications actually operate. The workload type determines the preferred instance qualities. For example, savings should be greatest for a batch processing job with infrequent high utilisation, which is suitable to instances that turn off when inactive so you aren't paying for CPU cycles you don't need.

An accurate workload assessment requires applying a statistical model of workload patterns that represent hourly, daily, monthly, and quarterly activity. It takes both breadth and depth of content to feed that model which underscores the need to know how your individual workloads function at a granular level and to understand the dependencies between application services.  

Cloud management should be systematic… not symptomatic

If you're still on the fence about using multiple cloud services, you're missing out. Owen Jenkins of research firm Kadence International says the benefits of multi-cloud in terms of efficiency and agility are so significant, the message from users is "just go for it." A recent study conducted by MIT Technology Review and VMware points out that companies adopting multi-cloud can expect some "growing pains.”

The researchers asked 1,300 IT managers at enterprises around the world about their approach to cloud adoption. The results indicate that organisations can minimise migration glitches by having a comprehensive multi-cloud roadmap. It also indicated that the three greatest challenges to successful implementation of multi-cloud are integrating legacy systems (cited by 61 per cent of respondents); the lack of skilled staff (more than 50 per cent); and understanding the new technology (61 per cent).

Buried within that data is the reason that analytics and machine learning is only part of the solution. While AIOps and related buzzwords can legitimately reduce cloud costs by 30 per cent or more it’s critical to dig deeper and recognised what caused the problems in the first place. In the case of cloud people and process is as important as tools and technology.

Once your analytics models are in place and your consolidated cloud is operating efficiently it’s critical to tackle the real challenge of the rest of the stack. One reason for underutilisation, lack of visibility, and poor reporting across clouds is a lack of control. The reason there is a lack of control is that developers inherently just want to get their projects deployed and don’t want to be constrained. The best way to keep multi-cloud estates running smoothly is to incorporate analytics as part of a systematic approach that also includes governance and automation.

Combining CloudOps, DevOps, and AIOps could be a buzzword mashup or it could be the roadmap to success in the next era of Hybrid IT.

Brad Parks, Vice President of Business Development, Morpheus (opens in new tab)
Image source: Shutterstock/everything possible

Brad Parks is Vice President of Business Development at Morpheus. Morpheus is a US unified orchestration ops start-up all about narrowing the gap between business expectations and the speed of IT delivery.