LTS Incident Response Charter

I. Problem/Value Statement

Problem Statement

The LTS portfolio frequently experiences service disruption. Whether it is on-call staff restoring a service at 4am, repeated automated system reboots, or an “all hands on deck” situation to contain and resolve an incident, LTS staff regularly implement fixes that resolve the issue at hand. Unfortunately, these short-term fixes often do not identify the underlying cause. This has resulted in gradually increasing system instability, which is unsustainable over the long term for both LTS human resources and the reliability of our service offerings. Additionally, and despite best intentions, LTS incident communication for extended outages can be uneven and opaque, which may erode trust with stakeholders over time.

Business Value

LTS aspires to improve its incident response process so that there’s a measurable reduction over time in late night fixes, automated service restoration, and “all hands on deck” situations. This work will include establishing communication norms and more efficient incident coordination. These improvements will correspond with an increase in service availability, accelerated incident resolution, and strengthened relationships with stakeholders.


II. Vision and Approach

Step 1: Revise our Service Level Agreement (SLA)

Review and revise our existing SLA levels so that they more clearly and accurately reflect the types of incidents and range of system and services offerings in the LTS portfolio. LTS will communicate the revised SLA out to Library leadership. 

Step 2: Review our services and map to the revised SLA

Review each service in the LTS portfolio to determine how it maps to the revised SLA, and associate systems and services with SLA levels accordingly. Collaborate with Business Owners to ensure the SLAs reasonably reflect service needs.

Step 3: Update the LTS incident lifecycle process

From initial incident report to incident resolution, LTS will update, codify, communicate, and socialize (internally and externally) its practices throughout the incident lifecycle.

MYGOs and OKR alignment

Our vision aligns with the following Harvard Library multi-year goals and objectives (MYGOs) and HUIT objectives and key results (OKRs): 

    • MYGOs
      • Review our organizational structures and operational practices, and implement any appropriate changes
      • Adopt a sustainable approach to technology lifecycle management and prepare for the future. 
    • OKRs
      • Achieve operational excellence across our services

III. In Scope/Out of Scope

In Scope

    • Review and revise SLA terms and assignments
    • Identify key service contacts
    • Associate revised SLAs for all LTS systems
    • Define and identify incident management roles
    • Establish communication norms for outages
    • Use SLAs to determine escalation path resourcing
    • Formalize incident tracking and reporting
    • Design and implement root cause analysis practice
    • Determine parameters for initiating disaster recovery remediation
    • Establish reporting to measure success and identify patterns
    • Align LTS MI process with the principles of the central HUIT MI process

Out of Scope

    • Incidents resolved through automated service restoration

IV. Deliverables/Work Products

    • Revised SLA terms and assignments 
    • Clearly-defined escalation paths
    • Incident role definitions with relevant staff identified for all LTS portfolio services
    • Communication norms for incidents
    • Root cause analysis practice with blameless postmortems
    • Update incident templates/workflows to create an LTS Incident Response Playbook

Definition of Done

This project will be done when the above work products have been completed, new SLA terms and assignments have been approved by library leadership and communicated to stakeholders, and when LTS internal processes have been revised and moved into practice for all systems.

V. Stakeholders and Project Team

NameTitle/AffiliationParticipation
Stu SnydmanAssociate University Librarian and Managing Director, LTSExecutive Sponsor
Sharon BayerDirector, Systems Deployment and Integration, LTSProject Lead
Laura MorseDirector, Library Systems & Support, LTSProject Lead
Vice President Direct ReportsHarvard Library Leadership

Stakeholder

Business OwnersHarvard LibraryStakeholder
Sara RubinowSenior IT Project Manager, LTSProject Consultant


VI. Acceptance

Accepted by Stu Snydman, 3/16/23

Prepared by Sara Rubinow