Vision
Revolutionize how researchers, students, and the global community access and explore Harvard's extensive collections, making all kinds of information easily discoverable and accessible.
Project Goals
- Enhance user experience
- Improve discovery and accessibility of special and archival collections and all types of digital collections including but not limited to image, text, audio, video, born digital, immersive (3d, XR, VR, MR), GIS, etc.
- Integrate distinct digital collections discovery platforms, including developing a new one
- Investigate and use AI-powered tools to enhance user experience and metadata
Problem and Value Statements
Problem Statement
Since its founding, Harvard Library has been a guardian of the University’s memory and a gateway to the world's knowledge. We currently host an array of discovery systems that use different design approaches, organizational priorities, and technology standards. Scholars and the public expect to be able to find trustworthy information and discover resources easily regardless of the system that is managing and providing access to it.
Solution Business Value
By enabling rich cross-collection search, this project will offer end users intuitive, contextual discovery of special collections, archives and digital collections, through a mix of conversational interfaces, browsing that emphasizes the visual nature of materials when appropriate, and recommendations for similar or related resources, all informed by ongoing user research.
Alignment with Harvard Library Multi-Year Goals and Objectives
This projects aligns with FY 24 HL Goals:
- Diversify and expand access to knowledge
Maximize the breadth of tangible and digital collections across Harvard and peer institutions, for the benefit of all partners
Increase our focus on acquiring, accessing, and creating digital content that is accessible to all, as open as possible, and permits creative uses of collections as data
Invest in open access infrastructure and services that support equitable, sustainable models for scholarly communication and open knowledge
Scope
In Scope
- Cross collection search for special and archival collections, including digital content using natural language search, generative AI features, and ability to browse
- Replace HOLLIS for Images and Harvard Digital Collections, extending their use cases to meet project goals: full text searching, born digital, GIS, A/V
- Reimagine metadata pipeline using new technologies from AI/ML
Out of Scope
- Discovery and access to licensed resources (articles, databases), traditional library resources (monographs, etc.)
Deliverables and Work Products
Key Tasks and Outcomes
Project Tracker Coming Soon!
Sprints | Outcome | Responsible Parties |
---|---|---|
Sprint 1 | Gained foundational understanding of back end, and established collaboration practices with each other and other HUIT and LTS colleagues. | Technical Project Team |
Sprint 2 | Investigated front end frameworks and decided on React, diagramed a draft front end architecture, and "made real" step 3 (semantic retrieval) in order to help begin the front end work. See recording of demo here. | Technical Project Team |
Sprint 3 | Initialize front end development (big win: to work with fastapi for semantic retrieval), finish deploy of semantic retrieval, and experiment with one LLM generative feature and finish indexing the Finding Aids. See recording of demo here. | Technical Project Team |
Sprint 4 | Continuing work on front end, making it deployable on dev and finishing back end generative AI features work. Planning for usability testing. | Technical Project Team |
Definition of Done
Discovery platform, including access to digital assets, is released on production environment and in use by Harvard constituents and the public.
Stakeholders
Executive Stakeholders | Title |
---|---|
Martha Whitehead | VP for Harvard Library and University Librarian |
Stu Snydman | AUL & Managing Director for Library Technology Services |
Open | AUL for Discovery and Access |
Tom Hyry | AUL for Archives and Special Collections |
The Library Stakeholders are acting as an extended project team, meeting weekly to help inform and prioritize the work.
Library Stakeholders | Title |
---|---|
Amy Deschenes | Head of UX and Digital Accessibility |
Kai Fay | Discovery & Access Strategic Projects Manager |
Adrien Hilton | Director of Technical Services for Archives and Special Collections |
Chelcie Rowell | Associate Head of Digital Collections Discovery |
Shalimar Fojas White | Herman & Joan Suit Librarian, Fine Arts Library |
Student intern, as needed | Harvard undergraduate student |
Technical Project Team
Team Member | Title | Project Role(s) |
---|---|---|
Enrique Diaz | Manager of Library Software Engineering | Product Owner (LTS) |
Doug Simon | Senior Digital Library Software Engineer | Developer (LTS) |
JJ Chen | Digital Library Data Engineer | Developer (LTS) |
Maura Meagher | Associate UX Developer | Developer (LTS) |
Carolyn Caizzi | Senior IT Project Manager | Project Manager/ Scrum Lead (LTS) |
Robert Hampton | UX Researcher | UX Researcher/Designer (HL) |
Estimated Schedule
Note: Project is managed by using the Scrum framework and these phases/milestones will be adjusted. Below is a tentative plan for Year 1 of project.
Phase | Phase Start | Phase End | Completion Milestone |
---|---|---|---|
1 | July 2024 | September 2024 | Natural language discovery platform with generative AI features for discovering digitized, special and archival collections is built and tested by selected end users. |
2 | October 2024 | December 2024 | Platform is adjusted based on user feedback and research into scaling platform for production is completed. Data pipeline is scoped. Design process for digitized collections component is completed. |
3 | January 2025 | March 2025 | Data pipeline and digitized collections components begin to be built. Decision to soft launch discovery platform is made depending on data pipeline. |
4 | April 2025 | June 2025 | Cont. building data pipeline and digitized collections components. Platform is monitored for costs and analytics are gathered and reviewed to plan for full launch September 2025. |
5-12 | Years 2-3 will build out full text search integration, more types of digital collection discovery, and access, as well as continuously improve the platform. Investigation into and possible rollout of workflows for using AI to improve quality of metadata. |
Assumptions, Constraints, Dependencies, and Risks
Project Assumptions
- Stakeholders either have or have identified the appropriate subject matter experts to advise on prioritization of work and other project matters
- Stakeholders will have made available the time required to participate in project activities and to complete tasks as requested
- Project sponsor and other stakeholders are empowered to make the decision required for the project to be a success
- Project sponsor will provide written approval to move forward with system development when requested as part of incremental/iterative system demonstrations
Project Constraints
- Scope - Flexible (all types of digital collections depends on unknowns)
- Time - Fixed 3 year project
- Cost - Fixed 3 year budget
Project Dependencies
- ArcLight implementation project
- Media Presentation Service upgrade
- LibraryCloud reimagine or defining a new data pipeline
- DRS Futures project
- Rapidly changing LLM industry
Project Risks
Description | Plan | Impact | Owner |
---|---|---|---|
Rapidly changing Generative AI space | Build system to be flexible, swap out models easily | Cost, trust | Technical Project Team |
Library metadata quality is varied and semantic retrieval works with unstructured data | See if metadata fields can help the quality of embeddings | Quality of retrieval | Metadata creators and Technical Project Team |
Acceptance
Accepted by:
Prepared by: Carolyn Caizzi
Effective Date: August 9 2024