Over the last 15 years one of the critical things that has characterised the relationship between IT and business has been the ongoing drive to achieve organisational agility. This is primarily down to the need to innovate, and to anticipate and react to competitor moves. Historically, IT has been seen as a back-office function, but now it is seen as a means of achieving agility, which has resulted in IT systems changing forever.
The first change, which is not frequently emphasised, is that IT systems have become more modular and are now broken down into more independent components. Why is this? Well, it’s easier to modify systems if they are built modularly. This change in philosophy was driven by the business need for greater agility which led to the re-architecting of IT systems on a serious scale. The concept of modularity was there before, but commercial deployments were undertaken in direct response to business requirements.
It’s important to note that when you have a modular system, components act independently of one another. Therefore, to understand what the system is doing you need to observe it from end-to-end. Historically, with monolithic systems, you only had to look at a small part of the system to infer what was going on under the hood. With modularity the overall performance of the system can’t be analysed by viewing a small snapshot. This means that organisations now require more data from IT systems to understand what is going on. However, it’s not just a question of increased volumes of data, it’s also the behaviour of individual data sets which has become harder to interpret. Therefore, in many respects, the drive for modularity has led to the emergence of Artificial IT Operations (AIOps), but it’s not the only factor…
Modularity has also led to the greater distribution of IT systems. Due to the geographical spread of IT systems, they are no longer confined to the data centre. Compute fabric is now spread far and wide which has led to genuine distributed computation which makes it harder to observe and understand what is occurring in in the system.
Pre-requisite for basic visibility
In addition, a growing percentage of modules have become more short-lived, such as containers and microservices, so the final nail in the coffin for Ops teams is that the system they’re trying to manage and monitor is constantly changing and evolving.
In summary, the business demanded increased agility from IT which has increased the difficulty in managing and supporting IT systems. Agility has come at a price. At Moogsoft we talk about an agility tax and it’s fair to say that the tax has been severe. So, given that these IT systems are not going to change, we must assume that these challenges are only going to get worse. Given all this, how can you assure the reliability of IT systems?
Unfortunately, the complexity of these systems now exceeds what a human being or even a team of human beings can manage. Datasets have become too large and diverse and data patterns are too complicated for the unaided human eye to process intelligently, no matter how experienced that person is.
Therefore, IT Operations teams need to invest in AIOps – not to predict the future, or to run a fully autonomic self-healing data centre or to reduce meantime to discovery to almost zero. They need it so they have eyes on how the system is performing, the alternative is to be blind and to pretend that the team is in control.
AIOps is now a pre-requisite for basic visibility. But, there is scepticism. There is still a strong sense that IT Operations is a cost centre and every new investment in a software layer must be compensated by getting rid of something else. The issue is that business has yet to fully comprehend that if it wants both agility and reliability it needs to make an investment in AIOps.
On the IT side, AIOps is perceived as threatening the status quo for many IT Ops teams, and DevOps teams are more likely to think they can do it themselves. So, although this type of technology is incredibly necessary, there are several teams, for very different reasons, that don’t fully appreciate the significance of AIOps. Having said all that, we are dealing with a market that in 2018 will be worth $2.5 billion and is growing at a rate of 25 per cent. So, even with all the resistance, money is being spent in this area.
If you embrace AIOps, do you need to change culture? I would say there are two changes. Firstly, because we are dealing with rapidly changing IT systems the way in which teams observe and try to resolve problems changes. Previously, top down, deterministic, almost mechanistic approaches have been used. Due to the speed of change in IT systems, this approach no longer works. Decision making now needs to be more dynamic and collaborative as everything is more distributed. Look at the way agile methodologies have changed application development – a similar wave of agile thinking is sweeping over IT Operations. AIOps is part of a tool kit which enables IT Ops teams to become more dynamic and act fast. So, the way in which people work will change, as decision making becomes more democratic and distributed.
One important footnote here, is how teams process incident management and problem management i.e. incidents are things you respond to quickly and if there are a pattern of incidents over time this highlight an underlying problem which then introduces a sperate set of tasks to deal with this issue. Classically, incident and problem management were viewed as distinct processes. That completely dissolves in a modern IT environment as things are changing so rapidly you cannot distinguish between incidents and problems. What AIOps brings to the table is to present incidents within the context of the problems that they manifest, so you move directly from the observation of the incident to problem management.
Secondly, IT Operations teams are at some point going to have to acquire data science skill sets. You buy technologies like AIOps to perform complex inferences, which means decisions need to be made based on the results these technologies present. To make these decisions you need some understanding on why you make the decision in the first place. These are decisions that impact both IT and the business in general. No matter how the results are obtained, there is a degree of responsibility on the team or individual to be able to interpret them.
The need for more data science skills
So, within IT Operations there is a need for more data science skills to deal with the new complexities IT systems are producing. Data science skills are required, but you don’t need the title of a data science professional to do your job. In summary, by using AIOps, IT Ops teams will become more collaborative, democratic, less rigid, more dynamic, and more entrepreneurial. Ultimately, this is the real consequence of the business wanting to become more agile.
If you don’t use AIOps the consequences are severe. It is no longer just an option, it’s a necessity. Without deploying it, your meantime to resolution will just get longer and longer. The worst thing is you will not understand what’s going on, you will basically be running blind which is serious when you consider many organisations rely on IT to help generate revenues. It’s now simply a question of how quickly you can get up and running with an AIOps solution.
Will Cappelli, CTO EMEA and Global VP of Product Strategy, Moogsoft
Image source: Shutterstock/TechnoVectors