Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In order to start any actual remediation process, several other things need to be done:

  1. Identifing Identifying the source of each issue type, and ensure that no new images with the same type of problem are introduced. This may require contacting the original deopsitors depositors and inquiring about the source of the images (often they come from third party sources, or have been digitized by individual departments' staff without any quality control by an imaging specialist).
  2. Offering an alternative way to generate delivery derivatives. This alternative would be the media transformation tools that LTS is building for massive-scale media conversion.
  3.  Devising and implementing a process to replace internal references to current delivery files to newly generated ones in LTS services.

...

  • MYGO #8: Focus technical services on effective workflows and metadata that matter the most
    • By offering a centralized service for efficient, replacing current delivery images with high-quality, and scalable processing of highly visible (public) images that can be used across campus we would encourage discontinuing one-off solutions that individual departments have been developing for lack of a better alternative, incurring in additional maintenance costs and inconsistent, often substandard output qualityhigh-performance ones we would improve what is likely one of the most critical aspects of our public facing services.
  • MYGO #10: Focus on space as a service, considering the most cost-effective approaches to user interests, collections security and preservation, and staff needs in HL and HCL facilities
    • HTJ2K is the most space- and computationally efficient format for Web quality images today. By converting existing lossless DRS images into HTJ2K a delivery-optimized format we would significantly reduce storage, computing and I/O usage on the most traffic-heavy (and expensive) tier of our infrastructure.
  • MYGO #14: Minimize the environmental impact of collections, services, and spaces
    • As described in MYGO #10, the goal of this project is to save computing resources, thus reducing the environmental footprint of our services.

Our vision aligns with the following HUIT objectives and key results (OKRs): 

  • Develop a plan for automation in each service area for critical, frequently used or heavily manual workflows

    Identify 20 candidate services that are “at risk” or “unsustainable” and produce action and/or remediation plans

    • This project seeks to optimize one of the most critical and frequently used workflows in the content production chainmake our image delivery services more stable and efficient. These services have been often flagged as highly problematic due to their instability, in great part caused by inadequate delivery media.

III. In Scope/Out of Scope

In Scope

  • Developing a plug-in for the Vips image library that uses Kakadu for reading and writing JPEG2000 images.
  • Integrating the plug-in into the currently developed imgconv project.
  • Maintaining the plug-in up to speed with future Vips upgrades
  • Documentation and tests for the developed codeIdentifying substandard and problematic image files
  • Generating reports on problematic images, grouped by types of issues, including extent and severity of the impact on user services and staff time
  • Developing tools to update internal image references
  • Prioritize, plan, and run remediation batches
  • Removing original delivery files once they have been verified to be replaced successfully

Note that this project is limited to still images. This include mostly, but not exclusively, JPEG2000s. Other media may need a different workflow and approachwill be covered by separate projects as necessary.

Out of Scope

The following items tasks are out of scope because they are achievable without for this project; however, an optimized HTJ2K converter would significantly improve their quality, however they are dependencies for it to start:

  • Development of a microservice for converting images based on configurable profiles (imgconv)
  • Integration of imgconv into an automation framework for large scale processing (drs-pipelines)Conversion of defective and/or substandard delivery images into delivery-optmized HTJ2K

IV. Deliverables and Work Products

...

Definition of Done

This project will be considered done once:

  • The Vips Kakadu plugin code is completed and committed to a Harvard-owned Git repository
  • We are consistently able to compile and run the plugin
  • Comprehensive tests are written for the key functions
  • All tests pass
  • Exhaustive relevant documentation is provided
  • We are able to integrate the delivered code into our imgconv project and verify that the features and options satisfy our needsAll identified images have been converted to the desired delivery-optimized format;
  • Such images are actively served by our delivery systems;
  • The older versions of the deliverable have been removed from storage.

V. Stakeholders

StakeholderTitleParticipation
Stu SnydmanAssociate University Librarian and Managing Director, Library TechnologyExecutive Sponsor, Business Owner















...

Project RoleTeam member(s)
Technical Product Owner / LTSLT OwnerStefano Cossu (LTS)

Software engineers

John Cupitt (independent contractor - development), Pierre-Anthony Lemieux (independent contractor - consulting)*, Brian Hoffman (LTS - integration), JJ Chen (LTS - integration)

QA

Stefano Cossu, Brian Hoffman, JJ Chen, Paul Aloisio (LTS), Imaging Services

Functional documentationJohn Cupitt

Stefano Cossu, Brian Hoffman, JJ Chen

Scrum Master

Stefano Cossu

Project Manager

Vitaly Zakuta (LTS)

...


VII. Estimated Schedule (tentative)

12/04/2023
PhasePhase StartPhase EndCompletion Milestone

Development & Release

11/27/2023

Develop code

Development & Release

12/05/202312/08/2023

Unit and integration tests

Development & Release

12/11/202312/15/2023

Complete & validate documentation

Development & Release

12/11/202312/22/2023

imgconv integration and release

















VIII. Assumptions, Constraints, Dependencies, and Risks

Project Assumptions

    • The code delivered by the contractor will be agnostic to external integrations.
    • Integration with imgconv and DRS pipelines, including deployment infrastructure and long-term maintenence, will be the DRS Futures engineering team.
    • Stakeholders will be available to participate in project activities and to complete tasks as requested.
    • The Executive Sponsor and other stakeholders are empowered to make the decisions required for the project to be a success.

Project Constraints

    • Contractor availability
    • DRS Futures team availability
    • Scope
    • Time
    • Budget

Project Dependencies

    The plugin developer will be need a Kakadu SDK license to perform development, testing, and long-term maintenance of the requested software. Kakadu Software has agreed to provide John Cupitt a free and renewable Kakadu SDK license. The handing-off of that license is underway.
  • Enough time has been allotted for developers to set up one batch at a time.
  • Remediation processes can take a very long time but should require very little oversight once started.
  • Some remediation batches can be deferred for a long time if their priority is low.

Project Constraints

  • Other DRS Futures integration priorities contending for developers' time
  • Infrastructure capacity and cost (the tools should be scalable, but some batches could still take many weeks or months to complete)

Project Dependencies

The following dependencies apply only to the effective remediation process. Other parts of the project (analysis, reporting, etc.) can start immediately.

  • Image conversion service (imgconv) and ETL framework (drs-pipelines) available in production

Project Risks

DescriptionPlanImpactOwner
Remediation process takes an unexpectedly long timeStart with a small batch; extrapolate timing; increase batch size progressively; adjust forecast iterativelyProject timeline
Remediation process is unstable and requires frequent monitoringBuild solid exception handling; perform extensive load testing in advanceStaff time
Large numbers of image conversion tasks failBuild non-blocking reporting system; allow automatic retrying for 5xx errors; review failures separatelyStaff time; may require re-engineering remediation tools or infrastructure
Reference replacement does not go according to plansTest small batch of replacement actions; back up databases; report failed replacement to avoid failures falling through cracks; test each image (HEAD req) before deleting old versionLost references / broken images; old images cannot be replaced for a long time and impose extra storage costs
Processes don't scale as expectedInspect infrastructure for scaling issues; if not resolvable, accept longer completion times.Longer delivery times

IX. Acceptance

Accepted by  [ TODO  ]

...