LTS Incident Response Charter
I. Problem/Value Statement
Problem Statement
The LTS portfolio frequently experiences service disruption. Whether it is on-call staff restoring a service at 4am, repeated automated system reboots, or an “all hands on deck” situation to contain and resolve an incident, LTS staff regularly implement fixes that resolve the issue at hand. Unfortunately, these short-term fixes often do not identify the underlying cause. This has resulted in gradually increasing system instability, which is unsustainable over the long term for both LTS human resources and the reliability of our service offerings. Additionally, and despite best intentions, LTS incident communication for extended outages can be uneven and opaque, which may erode trust with stakeholders over time.
Business Value
LTS aspires to improve its incident response process so that there’s a measurable reduction over time in late night fixes, automated service restoration, and “all hands on deck” situations. This work will include establishing communication norms and more efficient incident coordination. These improvements will correspond with an increase in service availability, accelerated incident resolution, and strengthened relationships with stakeholders.
II. Vision and Approach
Step 1: Revise our Service Level Agreement (SLA)
Review and revise our existing SLA levels so that they more clearly and accurately reflect the types of incidents and range of system and services offerings in the LTS portfolio. LTS will communicate the revised SLA out to Library leadership.
Step 2: Review our services and map to the revised SLA
Review each service in the LTS portfolio to determine how it maps to the revised SLA, and associate systems and services with SLA levels accordingly. Collaborate with Business Owners to ensure the SLAs reasonably reflect service needs.
Step 3: Update the LTS incident lifecycle process
From initial incident report to incident resolution, LTS will update, codify, communicate, and socialize (internally and externally) its practices throughout the incident lifecycle.
MYGOs and OKR alignment
Our vision aligns with the following Harvard Library multi-year goals and objectives (MYGOs) and HUIT objectives and key results (OKRs):
- MYGOs
- Review our organizational structures and operational practices, and implement any appropriate changes
- Adopt a sustainable approach to technology lifecycle management and prepare for the future.
- OKRs
- Achieve operational excellence across our services
- MYGOs
III. In Scope/Out of Scope
In Scope
- Review and revise SLA terms and assignments
- Identify key service contacts
- Associate revised SLAs for all LTS systems
- Define and identify incident management roles
- Establish communication norms for outages
- Use SLAs to determine escalation path resourcing
- Formalize incident tracking and reporting
- Design and implement root cause analysis practice
- Determine parameters for initiating disaster recovery remediation
- Establish reporting to measure success and identify patterns
- Align LTS MI process with the principles of the central HUIT MI process
Out of Scope
- Incidents resolved through automated service restoration
IV. Deliverables/Work Products
- Revised SLA terms and assignments
- Clearly-defined escalation paths
- Incident role definitions with relevant staff identified for all LTS portfolio services
- Communication norms for incidents
- Root cause analysis practice with blameless postmortems
- Update incident templates/workflows to create an LTS Incident Response Playbook
Definition of Done
This project will be done when the above work products have been completed, new SLA terms and assignments have been approved by library leadership and communicated to stakeholders, and when LTS internal processes have been revised and moved into practice for all systems.
V. Stakeholders and Project Team
Name | Title/Affiliation | Participation |
---|---|---|
Stu Snydman | Associate University Librarian and Managing Director, LTS | Executive Sponsor |
Sharon Bayer | Director, Systems Deployment and Integration, LTS | Project Lead |
Laura Morse | Director, Library Systems & Support, LTS | Project Lead |
Vice President Direct Reports | Harvard Library Leadership | Stakeholder |
Business Owners | Harvard Library | Stakeholder |
Sara Rubinow | Senior IT Project Manager, LTS | Project Consultant |
VI. Acceptance
Accepted by Stu Snydman, 3/16/23
Prepared by Sara Rubinow