Incident management is a process to handle incidents that may happen or already happened to an IT service operation. An unplanned disruption or degradation of service is called as an incident.
How does it work?
Incident can be reported by the users, system admins, servers, network devices or any process failure that may degrade or stop IT service. As soon as an incident gets reported, incident management team should log this incident with following information….
1. A unique incident ID for tracking.
2. Assign severity level
3. Assignment Resource
Depending on the nature of incident, the case can be handled in three ways…
1. Fix the problem with regular ops activity therefore without changing anything. For example, if database server goes done….bring it up immediately and restore service….collect logs for further analysis…and take measure for permanent fix.
2. Call change management process, if the incident requires making changes in the system to restore the service.
3. Call problem management process, if the issue requires involvement of software or hardware manufacture to get a fix. In such case it is advisable to provide temporary arrangement as workaround to restore the service.
Applying incident management effectively
Applying incident management in service operation can be really challenge-some if you are in a heterogeneous environment with so many integrations, to keep the service up and running. Usually in a big operation, there are a lot of critical applications/services run to provide support to business. These applications/services can be as critical as 24×7. In such environment followings are advisable to implement incident management process more effective…
1. List and categories the applications/services as per their critical level. This can determine from the availability, integrity and confidentiality requirements of an application/service from the business.
2. Quantifiable impact analysis on business such as financial loss, revenue leakage etc. during service disruption/downtime.
3. Availability of following to incident management team
- Matrix containing impact and severity level for each critical applications/services
- Escalation matrix
- Notification matrix
4. Periodic update to all stakeholders as per the notification matrix.
Often it happens that the severity level given to an incident during primary stage remains same even if the service continues degrading. Therefore it is better to have a time bound impact matrix to follow, so that the incident gets proper attention and get handled properly.
That’s all for today…………..