Skip to main content

10 best practice processes for dealing with major IT incidents

IT support team stood around a monitor
(Image credit: Getty Images)

Each time a major IT incidents such as a payroll crash or ransomware attack happen, an IT team gets into a fire-fighting mode and takes the resolution process to a whole new level. This doesn’t have to be the norm, if you follow best-practice steps like those outlined below. 

In no time, you can resolve the major incident with no panic. It's also worth being
aware of the best practices and up-to-date predictions when it comes to cybersecurity, as knowledge is power.

1. Clearly define major IT incidents

When an issue causes a huge business impact on several users, you can categorise it as a major incident. It is one that forces an organisation to deviate from existing incident management processes.

Usually, high-priority IT incidents are wrongly perceived as major incidents, and many organizations have no incident plan for security breaches. This is probably due to the absence of clear ITIL guidelines. Therefore, to avoid any confusion, you must define a major incident clearly based on factors such as urgency, impact and severity.

2. Have exclusive workflows

IT engineer working on computer

By having separate workflows, you can restore services quickly in the event of a major incident (Image credit: Getty Images)

Implementing a robust workflow helps you restore a disrupted service quickly. Separate workflows for major incidents help in seamless resolution. Focus on automating and simplifying the following when you formulate a workflow for major incidents:

  • Identifying the major incident
  • Communicating to the impacted stakeholders
  • Assigning the right people
  • Tracking the major incident throughout its lifecycle
  • Escalation upon breach of SLAs
  • Resolution and closure
  • Generation and analyses of reports

Ensure that you also have a no-approval process for resolving major IT incidents.

3. Reel in the right resources

Ensure that your best resources are working on major IT incidents. Also, clearly define their roles and responsibilities because of the high impact these incidents have on business. You could have a dedicated or a temporary team depending on how often major incidents occur.

Some organisations have a dedicated major incident team headed by a major incident manager, whereas others have a dynamic, ad-hoc team that has experts from various departments. Your primary objective must be to keep your resources engaged and avoid conflict of time and priorities.

4. Train your personnel and equip them with the right tools

male and female IT support workers discussing something

With the right training, IT incident response can be even more efficient, and get things back up in less time (Image credit: Getty Images)

You don’t know when a major incident will strike your IT, but the first step to handling it is by being prepared. Divide your major incident management team into sub teams, and train them in major incident management. Assign responsibilities by mapping skills with requirements.

Run simulation tests on a regular basis to identify strengths, evaluate performance and address gaps as needed. This will also help your team to cope with stress and be prepared when facing real-time scenarios. Equip your team with the right tools such as smart phones, phablets and tablets with seamless connectivity for them to work from anywhere during an emergency.

5. Configure stringent SLAs and hierarchical escalations

Define stringent SLAs for major IT incidents. Set up separate response and resolution SLAs with clear escalation points for any breach of the process. In addition, follow a manual escalation process if the assigned technician lacks the expertise to resolve the incident. Moreover, ensure that a backup technician is always available.

6. Keep your stakeholders informed

woman in tech support working on machine

Letting stakeholders know what's going on will help them understand the issue, and streamline support requests (Image credit: Getty Images)

Throughout the lifecycle of major IT incidents, send announcements, notifications, and status updates to the stakeholders. Announcements in the self-service portal will prevent end users from raising duplicate tickets and overloading the help desk.

Also, send hourly or bi-hourly updates during a service downtime caused by major incidents. Have a dedicated line to respond to major incidents immediately and offer support to stakeholders. Use the fastest means of communication, such as telephone calls, direct walk-ins, live chat, and remote desktop control, instead of relying on email.

7. Tie major IT incidents with other ITIL processes

After a major incident is resolved, perform a root cause analysis by using problem management methods. Then, implement organisation-wide changes to prevent the occurrence of similar incidents in the future by following the change management process.

Speed up the entire incident, problem and change management process by providing detailed information about the assets involved using asset management. Cover all of your assets by ensuring your incident response aligns with your cyber insurance policy, too.

8. Improvise your knowledge base

woman working on dual monitors at desk

By creating knowledge bases of major incident details, you can be better prepared in future (Image credit: Getty Images)

Formulate simple knowledge base article templates that capture critical details, such as the type of major incident the article relates to, the latest issue resolved using the article, the owner of the article, and the resources that would be needed to implement the solution.

Create and track solutions separately for major IT incidents, so that you can access them quickly with very little effort.

9. Review and report on major IT incidents

Document and analyse all major IT incidents, so that you can identify areas of improvement. This will help your team efficiently handle similar issues in the future.

Also, generate major incident-specific reports for analysis, evaluation and decision-making. You could generate the following reports to help in efficient decision-making:

  • Number of major incidents raised and closed each month
  • Average resolution time for major incidents
  • Percentage of downtime cause of major incidents
  • Problems and changes linked to major incidents

10. Document major incident processes for continual service improvement

software developer working on computers in office

Documenting processes for expedient referencing is another way to prepare for future crises (Image credit: Getty Images)

It is best practice to document major incident processes and workflows for ready reference. This could capture details like number of personnel involved, their roles and responsibilities, communication channels, tools used for the fix, approval and escalation workflows, and the overall strategy along with baseline metrics for response and resolution.

Top management must evaluate processes on a regular basis to check if targeted performance levels in major incident management are met. This can help rectify flaws and serve for continual service improvement.

IT incidents: Summary

Major IT incidents are unavoidable, and each one is a learning experience for your team. Adhering to these practices could be your first step towards mastering the art of handling major incidents.

Further reading on cybersecurity

Be proactive when it comes to cybersecurity, by installing the best antivirus software, and take one large part of incident response out of your IT team's hands. The importance of maintaining cybersecurity in your business has never been so key, and the fallout from a cyber attack can cause a business to suffer a series of knock-on effects too.

When it comes to disaster recovery, it's wise to understand where that ends and backups to the cloud begin: we compared disaster recovery vs cloud backup to see what the differences and similarities are.

Prithiv RajKumar is a marketing analyst at ManageEngine.

With contributions from