About Major Incident Management

Swinerton logo

BTech

BTech Alerts & Notifications - Check this page for the latest information on emergencies and system outages. All times are Pacific. Major Incident Management icon

Category: Major Incident Management

Major Incident Management

What is a Major Incident?

The process of addressing and resolving significant disruptions to Swinerton’s IT services, customer experience, or core product line. There are typically six aspects of Major Incident Management (MIM):

Identification
Response
Organization
Communication
Review
Resolution

What is the difference between a Major Incident and a Problem?

A major incident becomes a problem when it is a recurring event, or the same issue happens repeatedly. This is indicative of an underlying root cause that needs to be addressed through further investigation and proactive problem management. Rather than just resolving the immediate incident; this usually involves looking for patterns, analyzing similar incidents, and identify the root cause to prevent future occurrences. Additional considerations for consideration to move a Major Incident to a Problem:

Recurring incidents
High impact on business operations
Difficulty identifying root cause
Long resolution time

Types of Major Incidents

There are three types of Major Incidents that define the scope of impacted employees, they are as follows:

Full outage

No access to the service for any user.

Partial outage

Some users can access the service, while others are unable.

Performance degradation

Service is available but experiencing noticeable performance issues such as slow response times or limited functionality.

Service Level Agreement (SLA) Differences Incidents vs. Major Incidents

For incidents and service requests owned by BTech, the following SLA target is observed:

Priority

Response Within

Resolve Within

Urgent

4 Hours

3 Days

High

8 Hours

5 Days

Medium

12 Hours

10 Days

Low

16 Hours

20 Days

All times are business hours, the business hours clock runs from 5:00am to 6:00pm PST, Monday-Friday.

For major incidents on Services owned by BTech, the following SLA target is observed:

Type

Response Within

Resolve Within*

Full outage

30 Minutes

4 Hours

Partial outage

1 Hour

24 Hours

Performance Degradation

2 Hours

3 Days

*If the Major Incident is converted to a Problem, the Major Incident SLA no longer applies.

Procedures Overview for Major Incidents

As the Service Desk begins to spot a trend with tickets begin reported, this process is intended to guide anyone on the Service Desk to begin the Major Incident Management process.

At the first sign of a possible issue trend (3+ similar tickets, or a complete outage/availability of a product or service), place a team note on what you are seeing in the Service Desk Teams site, under the Daily Trending Issues channel.
- Office outages (partial or full should always generate a Major Incident).
Create a new Incident, with a clear description of the issue in the Subject field, and a description that states why the Major Incident is being created.
Convert the Incident to a Major Incident and fill in the required fields:
1. Business impact
2. Impacted locations / services
3. No. of customers impacted
4. Incident commander (You can list Jason Fearing, if you don't know who it should be)
5. Swinerton Managed Service (Yes/No)
Communicate to the Service Desk Team in the Daily Trending Issues the new tracking ticket number to associate any new incidents.
Associate all other related incidents with this Major Incident record
Coordinate who will update the Fresh Service Announcement Page (Joe -> Jason -> Scott)
Contact the Product Manager of the affected product – Products and Service Catalog
Coordinate who will update the BTech Alert Page (Product Managers -> Jason -> Scott -> Jack)

Ticket Logging Expectations

Because of the increased visibility with Major Incidents, any/all related interactions should be logged in real-time and appropriate screenshots added. If you're using external tools, making outbound or receiving inbound phone calls, please make sure that information is in the ticket. This is standard practice for all tickets, but with major incidents it's imperative that these are done and done in a timely fashion to when they occur.

Role of Incident Commander

The Incident Commander is a designated individual, typically within the Service Desk, unless otherwise delegated, that is responsible for coordinating the mobilization of support efforts related to the Major Incident.

This individual will be responsible for verifying the Major Incident is updated to reflect status, and coordinate follow up efforts and communications with impacted employees and stakeholders. This individual will:

Verifying all related tickets are closed once the issue has been resolved.
Work with Product Managers for communication updates.
Coordinate updates to outgoing communications.
Coordinate the Review and Resolution sequence for an “after-action” report to BTech leadership.
Close out the Major Incident.

To ensure appropriate and timely communications occur during a Major Incident, the Incident Commander will create a Teams group chat, involving the appropriate individuals. A daily summary/status shall be provided via email to BTech leadership.

Swinerton Managed Service

For outages and degradations of products & services not managed by BTech, but supported by BTech, the Major Incident Management system will still be used. Please make sure to indicate when creating the Major Incident whether this is a managed service.

The SLA timer will still run for all Major Incidents but only services managed by BTech will be factored into metric reporting.