Reimagining Discovery

Finalizing adding finding aids, designing for how to display heirarchical items from Finding Aids and CURIOSity. Add in CURIOSity items to index, add in document type to eyebrow on front end. Confirm LLM selection and system prompts through team testing. Finalize front end in preparation for release to QA in Sprint 7.

Vision

Revolutionize how researchers, students, and the global community access and explore Harvard's extensive collections, making all kinds of information easily discoverable and accessible.

Project Goals

  • Enhance user experience 
  • Improve discovery and accessibility of special and archival collections and all types of digital collections including but not limited to image, text, audio, video, born digital, immersive (3d, XR, VR, MR), GIS, etc.
  • Integrate distinct digital collections discovery platforms, including developing a new one
  • Investigate and use AI-powered tools to enhance user experience and metadata

Problem and Value Statements

Problem Statement

Since its founding, Harvard Library has been a guardian of the University’s memory and a gateway to the world's knowledge. We currently host an array of discovery systems that use different design approaches, organizational priorities, and technology standards. Scholars and the public expect to be able to find trustworthy information and discover resources easily regardless of the system that is managing and providing access to it.

Solution Business Value

By enabling rich cross-collection search, this project will offer end users intuitive, contextual discovery of special collections, archives and digital collections, through a mix of conversational interfaces, browsing that emphasizes the visual nature of materials when appropriate, and recommendations for similar or related resources, all informed by ongoing user research.

Alignment with Harvard Library Multi-Year Goals and Objectives

This projects aligns with FY 24 HL Goals:

  • Diversify and expand access to knowledge
  • Maximize the breadth of tangible and digital collections across Harvard and peer institutions, for the benefit of all partners

  • Increase our focus on acquiring, accessing, and creating digital content that is accessible to all, as open as possible, and permits creative uses of collections as data 

  • Invest in open access infrastructure and services that support equitable, sustainable models for scholarly communication and open knowledge

Scope

In Scope

  • Cross collection search for special and archival collections, focusing on the end user experience and making clear the relationships between archival objects/items and larger collections
  • Incorporate AI/ML technologies to offer natural language search, and generative AI features like summarization, while retaining baseline search and browse functions
  • Access to digital content, and act as a replacement HOLLIS for Images and Harvard Digital Collections, extending their use cases to meet project goals: full text searching, born digital, GIS, A/V
  • Reimagine metadata pipeline using new technologies from AI/ML

Out of Scope

  • Discovery and access to licensed resources (articles, databases) and general collections

Deliverables and Work Products

Key Tasks and Outcomes 

Project Tracker Coming Soon!

Sprints

Outcome

Responsible Parties

Sprint 1Gained foundational understanding of back end, and established collaboration practices with each other and other HUIT and LTS colleagues. Demo was not recorded.Technical Project Team
Sprint 2Investigated front end frameworks and decided on React, diagramed a draft front end architecture, and "made real" step 3 (semantic retrieval) in order to help begin the front end work. See recording of demo here. Technical Project Team

Sprint 3

Initialize front end development (big win: to work with fastapi for semantic retrieval), finish deploy of semantic retrieval, and experiment with one LLM generative feature and finish indexing the Finding Aids. See recording of demo here.Technical Project Team

Sprint 4

Continuing work on front end, making it deployable on dev and finishing back end generative AI features work. Planning for usability testing. See recorded demo here. Technical Project Team

Sprint 5

Fix the data issues with Finding Aids, add new set to index and investigate adding CURIOSity items to index. Finalize front end work and create end to end testing. By end of sprint, estimate when usability can begin. See a recording of the demo here.Technical Project Team

Sprint 6

Finalizing adding finding aids to index, designing for how to display hierarchical items from Finding Aids and CURIOSity. Add in CURIOSity items to index, add in document type to eyebrow on front end. Confirm LLM selection and system prompts through team testing. Finalize front end in preparation for release to QA in Sprint 7. See a recording of the demo here.Technical Project Team

Definition of Done

Discovery platform, including access to digital assets, is released on production environment and in use by Harvard constituents and the public. 

Stakeholders

Executive Stakeholders

Title

Martha WhiteheadVP for Harvard Library and University Librarian
Stu SnydmanAUL & Managing Director for Library Technology Services
Salwa IsmailAUL for Discovery and Access (Jan. 2025)
Tom HyryAUL for Archives and Special Collections

The Library Stakeholders are acting as an extended project team, meeting weekly to help inform and prioritize the work.

Library Stakeholders

Title

Amy DeschenesHead of UX and Digital Accessibility
Kai FayDiscovery & Access Strategic Projects Manager

Adrien Hilton

Director of Technical Services for Archives and Special Collections
Chelcie RowellAssociate Head of Digital Collections Discovery
Shalimar Fojas WhiteHerman & Joan Suit Librarian, Fine Arts Library
Student interns, as neededHarvard grad and undergraduate students

Technical Project Team

Team Member

Title

Project Role(s)

Enrique DiazManager of Library Software EngineeringProduct Owner (LTS)
Doug SimonSenior Digital Library Software EngineerDeveloper (LTS)
JJ ChenDigital Library Data EngineerDeveloper (LTS)
Maura MeagherAssociate UX DeveloperDeveloper (LTS)
Carolyn CaizziSenior IT Project ManagerProject Manager/ Scrum Lead (LTS)
Robert HamptonUX ResearcherUX Researcher/Designer (HL)

Estimated Schedule

Note: Project is managed by using the Scrum framework and these phases/milestones will be adjusted. Below is a tentative plan for Year 1 of project.

Phase

Phase Start

Phase End

Completion Milestone

1

July 2024

September 2024

Natural language discovery platform with generative AI features for discovering digitized, special and archival collections is built and released to QA for testing.
2

October 2024

December 2024Platform is tested by end users and improvements are recommended. Research into scaling platform for production is completed. Data pipeline is scoped and work begins. Design process for digitized collections (images) component is completed.
3

January 2025

March 2025Data pipeline and digitized collections components begin to be built. Decision to soft launch discovery platform is made depending on data pipeline.
4

April 2025

June 2025Cont. building data pipeline and digitized collections components.  Platform is monitored for costs and analytics are gathered and reviewed to plan for full launch September 2025. 
5-12



Years 2-3 will build out full text search integration, more types of digital collection discovery, and access, as well as continuously improve the platform.  Investigation into  and possible rollout of workflows for using AI to improve quality of metadata. 

Assumptions, Constraints, Dependencies, and Risks

Project Assumptions

    • Stakeholders either have or have identified the appropriate subject matter experts to advise on prioritization of work and other project matters
    • Stakeholders will have made available the time required to participate in project activities and to complete tasks as requested
    • Project sponsor and other stakeholders are empowered to make the decision required for the project to be a success
    • Project sponsor will provide written approval to move forward with system development when requested as part of incremental/iterative system demonstrations

Project Constraints

    • Scope - Flexible (all types of digital collections depends on unknowns)
    • Time -  Fixed 3 year project 
    • Cost - Fixed 3 year budget

Project Dependencies

  • ArcLight implementation project
  • Media Presentation Service upgrade
  • LibraryCloud reimagine or defining a new data pipeline
  • DRS Futures project
  • Rapidly changing LLM industry

Project Risks

Description

Plan

Impact

Owner

Rapidly changing Generative AI spaceBuild system to be flexible, swap out models easilyCost, trustTechnical Project Team
Library metadata quality is varied and semantic retrieval works with unstructured dataSee if metadata fields can help the quality of embeddings; experiment with different embedding models, focusing on full text content and multi-modal models for digital imagesQuality of retrievalMetadata creators and Technical Project Team
Unexpected changes to other library systems like Aeon, JSTOR ForumAccount for and expect changes from external systems in design of data pipelineTimeline delaysTechnical Project Team
Staff capacity to support work of the projectMeeting weekly with stakeholders to ensure there is enough time to plan for bouts of work that include time from broader staffOverall project successLibrary Stakeholders 

Acceptance

Accepted by: Library Stakeholders August 8 2024

Prepared by: Carolyn Caizzi

Effective Date: August 9 2024