Still image remediation project charter

I. Problem/Value Statement

Problem Statement

DRS, as it stands currently, has a very low barrier for the quality of the media tha can be uploaded as archival masters. This allows users to preserve files even if the only available copy is not deemed preservation-worthy according to DP community standards, which is better than not having any version of some contents available; however, the lack of any validation and notification for files that may be unsuitable for online visualization has created numerous problems in MPS, the LTS delivery service. This has an especially severe effect on still images, the majority of which are stored as losslessly compressed JPEG2000 (JP2). Many of these images are causing severe performance penalties or even crashes of the LTS delivery servers. Thus, a comprehensive scan of the current image collection that identifies the problematic images, as well as a pathway to convert them, if possible, to valid, web-optimized images, has been deemed necessary.

In order to optimize images for web display, as well as to reduce storage costs on a very expensive storage tier of our infrastructure, ad-hoc derivative images should be generated that take advantage of the most advanced encoding and compression algorithms.

This approach poses a problem, because currently the same lossless JP2s are used for both archiving and for delivery. Upon publishing of an image on DRS, the archival master is simply copied over to a delivery location. Image identifiers and file names are stored in delivery and name resolution systems, and cannot be easily separated. Additional steps need to be devised to replace the current references to the archival master copies, to references to the new delivery derivatives.

Business Value

Leaving delivery images in the current status will continue causing major disruption to delivery services and draining LTS staff's time. Hence, however laborious the process of remediating and separating out archival and delivery files may be, it is undoubtedly necessary and worth the effort. This initiative would bring several advantages, including:

Overall better quality and performance of delivery services;
Reduced LTS developer and tech support time in the long run;
Lower storage, I/O, and computing costs for delivering images;
Better control and insight of our images' quality;
Content providers would no longer be obligated to provide JPEG2000 for archival masters (as long as they are provided in a preservation-worthy format, or can be converted into one upon deposit, different original image formats can be optimized into delivery formats);
Better separation of concerns between archival and delivery files.

II. Vision and Approach

The first step of this project would be to identify the types of problems their sources, the number of affected images, and the severity of the problem from end-user's and maintenance staff's perspectives. Once that is completed, several remediation projects can be planned and prioritized according to the severity and extent of the issues that would be resolved.

In order to start any actual remediation process, several other things need to be done:

Identifying the source of each issue type, and ensure that no new images with the same type of problem are introduced. This may require contacting the original depositors and inquiring about the source of the images (often they come from third party sources, or have been digitized by individual departments' staff without any quality control by an imaging specialist).
Offering an alternative way to generate delivery derivatives. This alternative would be the media transformation tools that LTS is building for massive-scale media conversion.
Devising and implementing a process to replace internal references to current delivery files to newly generated ones in LTS services.

Our vision aligns with the following Harvard Library multi-year goals and objectives (MYGOs):

MYGO #8: Focus technical services on effective workflows and metadata that matter the most
- By replacing current delivery images with high-quality, high-performance ones we would improve what is likely one of the most critical aspects of our public facing services.
MYGO #10: Focus on space as a service, considering the most cost-effective approaches to user interests, collections security and preservation, and staff needs in HL and HCL facilities
- By converting existing lossless DRS images into a delivery-optimized format we would significantly reduce storage, computing and I/O usage on the most traffic-heavy (and expensive) tier of our infrastructure.
MYGO #14: Minimize the environmental impact of collections, services, and spaces
- As described in MYGO #10, the goal of this project is to save computing resources, thus reducing the environmental footprint of our services.

Our vision aligns with the following HUIT objectives and key results (OKRs):

Identify 20 candidate services that are “at risk” or “unsustainable” and produce action and/or remediation plans
- This project seeks to make our image delivery services more stable and efficient. These services have been often flagged as highly problematic due to their instability, in great part caused by inadequate delivery media.

III. In Scope/Out of Scope

In Scope

Identifying substandard and problematic image files
Generating reports on problematic images, grouped by types of issues, including extent and severity of the impact on user services and staff time
Developing tools to update internal image references
Prioritize, plan, and run remediation batches
Removing original delivery files once they have been verified to be replaced successfully

Note that this project is limited to still images. This include mostly, but not exclusively, JPEG2000s. Other media will be covered by separate projects as necessary.

Out of Scope

The following tasks are out of scope for this project, however they are dependencies for it to start:

Development of a microservice for converting images based on configurable profiles (imgconv)
Integration of imgconv into an automation framework for large scale processing (drs-pipelines)

IV. Deliverables and Work Products

Definition of Done

This project will be considered done once:

All identified images have been converted to the desired delivery-optimized format;
Such images are actively served by our delivery systems;
The older versions of the deliverable have been removed from storage.

V. Stakeholders

Stakeholder	Title	Participation
Stu Snydman	Associate University Librarian and Managing Director, Library Technology	Executive Sponsor, Business Owner

VI. Project Team

Project Role	Team member(s)
Technical Product Owner / LTSLT Owner	Stefano Cossu (LTS)
Software engineers	Brian Hoffman (LTS), JJ Chen (LTS)
QA	Stefano Cossu, Brian Hoffman, JJ Chen, Paul Aloisio (LTS), Imaging Services
Functional documentation	Stefano Cossu, Brian Hoffman, JJ Chen
Scrum Master	Stefano Cossu
Project Manager	Vitaly Zakuta (LTS)

VII. Estimated Schedule (tentative)

Phase	Phase Start	Phase End	Completion Milestone

VIII. Assumptions, Constraints, Dependencies, and Risks

Project Assumptions

Enough time has been allotted for developers to set up one batch at a time.
Remediation processes can take a very long time but should require very little oversight once started.
Some remediation batches can be deferred for a long time if their priority is low.

Project Constraints

Other DRS Futures integration priorities contending for developers' time
Infrastructure capacity and cost (the tools should be scalable, but some batches could still take many weeks or months to complete)

Project Dependencies

The following dependencies apply only to the effective remediation process. Other parts of the project (analysis, reporting, etc.) can start immediately.

Image conversion service (imgconv) and ETL framework (drs-pipelines) available in production

Project Risks

Description	Plan	Impact
Remediation process takes an unexpectedly long time	Start with a small batch; extrapolate timing; increase batch size progressively; adjust forecast iteratively	Project timeline
Remediation process is unstable and requires frequent monitoring	Build solid exception handling; perform extensive load testing in advance	Staff time
Large numbers of image conversion tasks fail	Build non-blocking reporting system; allow automatic retrying for 5xx errors; review failures separately	Staff time; may require re-engineering remediation tools or infrastructure
Reference replacement does not go according to plans	Test small batch of replacement actions; back up databases; report failed replacement to avoid failures falling through cracks; test each image (HEAD req) before deleting old version	Lost references / broken images; old images cannot be replaced for a long time and impose extra storage costs
Processes don't scale as expected	Inspect infrastructure for scaling issues; if not resolvable, accept longer completion times.	Longer delivery times

IX. Acceptance

Accepted by [ TODO ]

Prepared by Stefano Cossu

Effective Date: [ TODO ]

Library Technology Services: Staff Documentation Center