Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

I. Problem/Value Statement

Problem Statement

Part of the DRS Futures project consists of optimizing the data pipelines that feed the LTS delivery services and creating deliverable images that are separate from their archival and production masters, using a format that is optimized for delivery rather than for long-time preservation. This, combined with an extensive remediation of images that are currently undeliverable or very inefficient because of the way they were originally generated, should result in a drastic reduction of error rates and resource consumption of media delivery services.

We have identified a format into which we want to convert all the still images in DRS, starting from the most problematic ones. This format is High Throughput JPEG2000 (HTJ2K), a quite recent standard that is supported by a few encoding and decoding tools. (Note: this conversion would leave the archival images in DRS unchanged, it would only replace the delivery images.)

The leading solution for HTJ2K in terms of encoding speed and efficiency is Kakadu, a proprietary software that LTS is licensed to use and is currently engaging to generate traditional JPEG2000 images. Kakadu, however, has a few disadvantages when it comes to bulk conversion: it offers command-line tools that were never meant to be used in large-scale batch jobs. Additionally, these tools can only convert JPEG2000 to or from another format, not from JPEG2000 to JPEG2000. Since we want to convert old-style JP2 to HTJ2K, we would have to convert each image twice, effectively doubling the conversion time. Multiplied by many millions of images, this can have a dramatic effect on the time scale of the project.

In order to perform large-scale image conversion and offer the best possible on-demand conversion service for future data pipelines, we need to integrate HTJ2K read and write functionality more tightly with our systems, and be able to perform a one-step conversion from old JP2 to HTJ2K.

Business Value

A more efficient, one-step conversion of existing JP2 images would greatly expedite the clearing of a very large backlog of images. Since we are planning to process tens to hundreds of millions of files, even a small increase in processing time per image could lead to a very significant reduction in the overall timing. Strictly related to improving existing substandard images is resolving their root cause, which is the workflows with which these images were produced and introduced into DRS. Many of these images were produced by individual departments with ad-hoc developed or outsourced tools, without a central oversight on the quality of the resulting derivatives. A central service for creating derivatives specifically for delivery using community best practices and standards would drastically improve both the overall published media quality and Harvard staff's content production workflows.

Business Value

Moreover, by integrating a simpler and more efficient conversion tool in our future data transformation pipelines, we would expedite day-to-day operations such as image deposit and publishing in an upcoming DRS Futures scenario, and decrease the time needed, e.g. for a content manager who just deposited a large collection of images, to see that collection published online. This would encourage adoption of an LTS-managed centralized image conversion tool that would deliver consistently high quality images at high speed.

II. Vision and Approach

We have identified a format into which we want to convert all the still images in DRS, starting from the most problematic ones. This format is High Throughput JPEG2000 (HTJ2K), a quite recent standard that is supported by a few encoding and decoding tools. (Note: this conversion would leave the archival images in DRS unchanged, it would only replace the delivery images.)



The ideal scenario for this project is to have at our disposal the tools to convert images from any format that may be stored in DRS to HTJ2K in the simplest and most resource-efficient way.

...

  • Develop a plan for automation in each service area for critical, frequently used or heavily manual workflows
    • This project seeks to optimize one of the most critical and frequently used workflows in the content production chain.

III. In Scope/Out of Scope

In Scope

  • Developing a plug-in for the Vips image library that uses Kakadu for reading and writing JPEG2000 images.
  • Integrating the plug-in into the currently developed imgconv project.
  • Maintaining the plug-in up to speed with future Vips upgrades
  • Documentation and tests for the developed code

Note that this project is limited to still images. Other media may need a different workflow and approach.

Out of Scope

The following items are out of scope because they are achievable without this project; however, an optimized HTJ2K converter would significantly improve their quality:

  • Development of a microservice for converting images based on configurable profiles (imgconv)
  • Integration of imgconv into an automation framework for large scale processing (drs-pipelines)
  • Conversion of defective and/or substandard delivery images into delivery-optmized HTJ2K

IV. Deliverables and Work Products

  • Code, documentation and tests for a Kakadu plugin for Vips in a Git repository.

Definition of Done

This project will be considered done once:

  • The Vips Kakadu plugin code is completed and committed to a Harvard-owned Git repository
  • We are consistently able to compile and run the plugin
  • Comprehensive tests are written for the key functions
  • All tests pass
  • Exhaustive relevant documentation is provided
  • We are able to integrate the delivered code into our imgconv project and verify that the features and options satisfy our needs.

V. Stakeholders

StakeholderTitleParticipation
Stu SnydmanAssociate University Librarian and Managing Director, Library TechnologyExecutive Sponsor, Business Owner















VI. Project Team

Project RoleTeam member(s)
Technical Product Owner / LTSLT OwnerStefano Cossu (LTS)

Software engineers

John Cupitt (independent contractor - development), Pierre-Anthony Lemieux (independent contractor - consulting)*, Brian Hoffman (LTS - integration), JJ Chen (LTS - integration)

QA

Stefano Cossu, Brian Hoffman

Functional documentation

John Cupitt

Scrum Master

Stefano Cossu

Project Manager

Vitaly Zakuta (LTS)

* Pierre-Anthony Lemieux deferred to John Cupitt for carrying out the development, remaining available for advising on Kakadu-specific topics. We have not yet established whether such consulting will be pro bono or for a fee. In the latter case, we should request a cost estimate, but ideally we would like to have one contractor billing for the whole project.

VII. Estimated Schedule (tentative)

PhasePhase StartPhase EndCompletion Milestone

Development & Release

11/27/202312/04/2023

Develop code

Development & Release

12/05/202312/08/2023

Unit and integration tests

Development & Release

12/11/202312/15/2023

Complete & validate documentation

Development & Release

12/11/202312/22/2023

imgconv integration and release

VIII. Assumptions, Constraints, Dependencies, and Risks

Project Assumptions

    • The code delivered by the contractor will be agnostic to external integrations.
    • Integration with imgconv and DRS pipelines, including deployment infrastructure and long-term maintenence, will be the DRS Futures engineering team.
    • Stakeholders will be available to participate in project activities and to complete tasks as requested.
    • The Executive Sponsor and other stakeholders are empowered to make the decisions required for the project to be a success.

Project Constraints

    • Contractor availability
    • DRS Futures team availability
    • Scope
    • Time
    • Budget

Project Dependencies

    • The plugin developer will be need a Kakadu SDK license to perform development, testing, and long-term maintenance of the requested software. Kakadu Software has agreed to provide John Cupitt a free and renewable Kakadu SDK license. The handing-off of that license is underway.

Project Risks

DescriptionPlanImpactOwner
















IX. Acceptance

Accepted by  [ TODO  ]

Prepared by Stefano Cossu

...