About Major Incident Management
BTech
BTech Alerts & Notifications - Check this page for the latest information on emergencies and system outages. All times are Pacific.
Major Incident Management
What is a Major Incident?
The process of addressing and resolving significant disruptions to Swinerton’s IT services, customer experience, or core product line. There are typically six aspects of Major Incident Management (MIM):
- Identification
- Response
- Organization
- Communication
- Review
- Resolution
What is the difference between a Major Incident and a Problem?
A major incident becomes a problem when it is a recurring event, or the same issue happens repeatedly. This is indicative of an underlying root cause that needs to be addressed through further investigation and proactive problem management. Rather than just resolving the immediate incident; this usually involves looking for patterns, analyzing similar incidents, and identify the root cause to prevent future occurrences. Additional considerations for consideration to move a Major Incident to a Problem:
- Recurring incidents
- High impact on business operations
- Difficulty identifying root cause
- Long resolution time
Types of Major Incidents
There are three types of Major Incidents that define the scope of impacted employees, they are as follows:
Full outage
No access to the service for any user.
Partial outage
Some users can access the service, while others are unable.
Performance degradation
Service is available but experiencing noticeable performance issues such as slow response times or limited functionality.
Service Level Agreement (SLA) Differences Incidents vs. Major Incidents
For incidents and service requests owned by BTech, the following SLA target is observed:
Priority
Response Within
Resolve Within
Urgent
4 Hours
3 Days
High
8 Hours
5 Days
Medium
12 Hours
10 Days
Low
16 Hours
20 Days
All times are business hours, the business hours clock runs from 5:00am to 6:00pm PST, Monday-Friday.
For major incidents on Services owned by BTech, the following SLA target is observed:
Priority
Response Within
Resolve Within*
Full outage
30 Minutes
8 Hours
Partial outage
1 Hour
24 Hours
Performance Degradation
2 Hours
5 Days
*If the Major Incident is converted to a Problem, the Major Incident SLA no longer applies.
Procedures Overview for Major Incidents
As the Service Desk begins to spot a trend with tickets begin reported, this process is intended to guide anyone on the Service Desk to begin the Major Incident Management process.
-
At the first sign of a possible issue trend (3+ similar tickets, or a complete outage/availability of a product or service), place a team note on what you are seeing in the Service Desk Teams site, under the Daily Trending Issues channel.
- Create a new Incident, with a clear description of the issue in the Subject field, and a description that states why the Major Incident is being created.
-
Convert the Incident to a Major Incident and fill in the required fields:
- Business impact
- Impacted locations / services
- No. of customers impacted
- Incident commander
- Swinerton Managed Service (Yes/No)
- Communicate to the Service Desk Team in the Daily Trending Issues the new tracking ticket number to associate any new incidents.
- Associate all other related incidents with this Major Incident record
- Coordinate who will update the Fresh Service Announcement Page (Joe -> Jason -> Scott)
-
Contact the Product Manager of the affected product – Products and Service Catalog
- Coordinate who will update the BTech Alert Page (Product Managers -> Jason -> Scott -> Jack)
Role of Incident Commander
The Incident Commander is a designated individual, typically within the Service Desk, unless otherwise delegated, that is responsible for coordinating the mobilization of support efforts related to the Major Incident.
This individual will be responsible for verifying the Major Incident is updated to reflect status, and coordinate follow up efforts and communications with impacted employees and stakeholders. This individual will:
- Verifying all related tickets are closed once the issue has been resolved.
- Work with Product Managers for communication updates.
- Coordinate updates to outgoing communications.
- Coordinate the Review and Resolution sequence for an “after-action” report to BTech leadership.
- Close out the Major Incident.
To ensure appropriate and timely communications occur during a Major Incident, the Incident Commander will create a Teams group chat, involving the appropriate individuals. A daily summary/status shall be provided via email to BTech leadership.
Swinerton Managed Service
For outages and degradations of products & services not managed by BTech, but supported by BTech, the Major Incident Management system will still be used. Please make sure to indicate when creating the Major Incident whether this is a managed service.
The SLA timer will still run for all Major Incidents but only services managed by BTech will be factored into metric reporting.