Skip to main content

Top five ways to improve your incident management

(Image credit: Image Credit: Dotshock / Shutterstock)

Has a recent influx in your IT incidents caused you to see inefficiencies in your IT incident management processes? Has this revealed considerable bottlenecks within your IT business processes which were previously hidden? I have had multiple IT Directors, Managers and Service Delivery Managers, come to me expressing utmost discontent with how long it takes to resolve incidents, and having the same problematic incidents raising their heads again, and again. I am going to share with you my five top tips to help your incident management process become more efficient.

  • 1.      Sharing the knowledge

Transferring knowledge between yourself and your employees is critical to reducing fix times and stop bottle necks from developing. Imagine, you are working in an IT team of 30 people who all have their own areas of speciality. One of these members is a print specialist. One day, the company printers encounter an issue and they all stop working. Each printer takes 1-2 hours to fix. The print specialist could spend up to 8 hours fixing all these printers, which equates to one full working day. Therefore, they are unable to complete other essential work. If this team member was also the only team member who knew about the company phone system setup, and the company staff were experiencing outbound calling issues on that same day. This will cause a greater problem for the IT team as both the printer, and phone issues, cannot be resolved on the same day. Therefore, these critical tasks will be spilt over to the next day, and further difficulties will appear if team members are absent.

This issue which appears to be considerably large in scale, has a remarkably simple solution. Share the Knowledge. This could include peer training. For example, the print specialist could upskill another member of the IT team, enabling them to also resolve printer issues in future. Another alternative could be to create knowledge base articles on how to resolve basic printer dilemmas. These articles can also be made available to end users so they too, can resolve issues on their own without needing to contact the service desk. Both options would allow the print specialist to delegate work, reducing the team’s reliance on a single member to perform this crucial role.

There are some ITSM solutions available which can suggest knowledge base articles to users when they are logging a ticket via the self-service portal, based on the information provided in the form. The same can happen for agents, suggested articles will appear while viewing a ticket. Make sure your current ITSM solution has this feature, if not, keep an eye out for it when you are next in the market so you can strengthen your IT team’s capability.

  • Problem management

A well-constructed problem management process is crucial to any successful incident management process. Sometimes, things go wrong, and this can impact several users at one time. The aim of problem management is to reduce the number of issues which occur later, by effectively investigating, and resolving, the source of incidents.

When a major incident occurs, for example, when the internet connection to a building drops, it is imperative the correct team members are investigating the issue and restoring supply as soon as possible. A problem should also be related to the major incident so the root cause of the issue can be identified and if possible, a workaround can be found. Once you know exactly what is causing this issue and there is a documented workaround, it is then defined as a known error. Known errors are stored as a record in the known error database (KEDB) where they can be evaluated to be resolved permanently or referred to later.

For some problems, a fix might not always be the best solution. Once the root cause analysis has been complete, and the key problem is clear, you will need to work out if the relevant fix is financially viable. A workaround for the problem which causes little impact, and minimal time to carry out, would be a better solution than making a costly, high impact change to resolve the issue. Therefore, each problem will need to be judged on an independent basis.

There are quality ITSM solutions readily available which have automated problem detection. They will notify you the ticket currently being viewed matches details such as category, location, and subject as an active problem. If no problem ticket is currently open, and the system notices multiple incidents with the same matching criteria, the tool can advise there may be a problem underlying multiple incidents. Most ITSM solutions will also allow updates to be sent from the parent problem, which is sent to all users of the linked incidents. This feature will enable you to send one update rather than many duplicate messages to each individual user affected. My next tip, remote monitoring, can add greater value to the problem management process.

  • Remote monitoring

There are several remote monitoring tools currently available for industry. These not only automatically search your network and pull back critical device information but run automated tests, and regular checks on devices to enable you to quickly resolve device issues before the end-user notices. For example, pinging a server. The monitoring tool pings the server at regular intervals and checks for a response. If successful, no further action is required. If the ping test fails, you have a problem. The monitoring tool will consequently advise you of this with a notification.

If you are not already using a monitoring tool for these essential checks, from previous experience with my clients, I recommend you do so. The benefit these alerts provide to your incident management process includes not only so you can work on resolving issues immediately, but prevent 10’s or 100’s of incident tickets being logged by frustrated end users. By actioning the issue ticket efficiently, employees may not even notice a loss of supply to a service. Therefore, there will be less emails received by the service desk, and less responses having to be sent out by your team, reducing your IT team’s overall workload. Faster awareness of the issue results in less downtime.

Why not take this one step further? Most ITSM solutions have integrations with RMM tools so when the error alerts are received, they automatically create an incident ticket and relate it to the relevant asset or configuration item (CI). Some ITSM solutions will also allow you to automatically update service statuses displayed to end users when these alerts are received.

  • Capturing the correct data first time

How many times a day do you find yourself asking a user for more information about an issue they are experiencing? 5? 10? If your colleagues are chasing users consistently for more information, then your team could be wasting a significant amount of time each working day. This means not only your time will be wasted, but your users time as well, which can lead to frustration. Wouldn’t it be better, and easier, if each ticket already had all the information you required regarding the user query on it, so you can resolve the issue first time? Using a self-service portal is your best solution.

A service catalogue provided on an end user portal will allow you to build different user question forms based on the type of issue they are experiencing. For example, if a user cannot browse to a webpage, you might consider asking what the web URL is? Has an error message popped up? By having access to all information about a user’s issue, it enables the engineer to be more efficient, and investigate and resolve the issue in a swift amount of time.

As you can see, the benefits of having all information provided in one go using a service catalogue far outweighs that email saying, “I can’t access a website”.

  • Reporting

Although reporting is point five, it is far from the least important. Effective reporting is crucial to making any form of improvements, or changes, to your incident management process so you can make it the best it possibly can be.

Let us take point 3, Capturing the Correct Data First Time, as an example. Reporting will show you the average number of interactions on an incident ticket, and most importantly, the number of emails sent to the end user. If the report allows you to filter by category and the method the incident was logged, even better. The number of emails sent to a user will vary depending on the type of issue. However, if your team has an average of over 20 outbound emails per incident, your team is unlikely receiving all essential information first time round. A simple email may take 2-5 minutes to compose and send; 20 emails based off this estimate could take 100 minutes.

Other examples could include:

  • Average time to resolve based on category: allows you to find issues which take your team longer to resolve.
  • Tickets logged by channel: are your users using the knowledge base on your self-service portal before logging a ticket? Multiple incidents submitted via email would suggest otherwise.
  • First level resolution rate: displays how effective your first line team is and allows you to identify areas for upskilling. Further reporting on this area, such as a breakdown by categories, would tell you more.

When configuring or maintaining an ITSM system, it is important to keep key data requests in mind throughout. If the data is not captured, you cannot report on it.

Daniel Goldsmith, APAC Executive, Halo Service Solutions Melbourne