- Problem/Value Statement
- Vision and Approach
- In Scope/Out of Scope
- Deliverables/Work Products
- Stakeholders and Project Team
- Schedule
- Key tasks and outcomes
- Assumptions, Risks, and Constraints
- Appendix: Definitions of Roles
...
Table of Contents maxLevel 2
...
I. Problem/Value Statement
Problem Statement
The Harvard Geospatial Library (HGL) enables researchers to discover and easily access the wealth of geospatial data available to the Harvard community. Data sets are available from around the world at various scales, from global to local. Each data set is delivered with complete metadata, making it easier to add to a geographic information system (GIS) and compare to other data sets about the same place.
HGL currently uses OpenGeoportal (OGP), a platform that is no longer developed or supported. The platform has led to reliability and stability problems. It is also impossible to make any improvements to the HGL user interface because there are no developers who can work on the OGP source code.
LTS has also developed custom programs for loading data into HGL’s GeoServer, which stores and delivers the map data. After a necessary infrastructure change, the loading programs stopped working for an important category of material. Scanned maps can be loaded, but the process is still very cumbersome.
HGL relies on LTS’s Access Management Service (AMS) to provide authorized access to licensed data sets. AMS is being retired. Current systems are being re-engineered to use more centrally supported Harvard systems for authentication and authorization. Sooner or later HGL will need to be required to also use the centrally supported authentication and authorization systems.
The Harvard Library intends to modernize its implementation of a geospatial data access & discovery layer, establish a sustainable workflow data loading, and make geospatial data downloadable.
Business Value
The work proposed here meets a long-standing list of requests made by students, researchers, faculty and stakeholders over the course of several years. This project will follow the recommendation of the Harvard Geospatial Working Group and transition HGL from the current open source platform, OpenGeoportal (OGP, developed at Tufts) to a new open source platform, GeoBlacklight (GBL, developed primarily at Stanford). Harvard will become an active participant in the GBL community of users, which includes many peer institutions, including 6 Ivy Plus members.
Creating a robust and sustainable environment through which maps and myriad forms of geospatial data can be discovered, explored and downloaded fulfills a core tenet of the Library’s mission, and remediates an unstable and outdated data ingest and solution. It is critical the Library leverages those resources to reduce the practical costs of ownership and development, and increase its viability as a consortial partner in the GIS scholarly community.
...
current DRS storage infrastructure (disk and tape) is in the final year of its service term and must be replaced. At the same time, DRS business owners would like to expand the range of storage options (e.g., cloud, external, etc.) and provide greater flexibility in replication policy.
Business Value
This storage refresh is critical for maintaining continuity of DRS service. It will also have the benefit of anticipated lowering of provisioning/operational costs, which then can be reflected in reduced DRS pricing. Support for an expansion of the range of storage options will streamline future incorporation of new technical solutions. It also enables implementation of policy-driven replication that provides better alignment of curatorially-designated value and goals with the technical characteristics of various storage components that best ensure those goals. This in turn will reduce overall costs (by minimizing the number and type of replicas) and maximize use of finite resources (space no longer used for a copy of A can now be used for a copy of B).
II. Vision and Approach
The redesign provisioning of HGL will use the open source GeoBlacklight platform and establish a development-to-production environment for HGL based on LTS protocols and standards. The project will build on the knowledge gained from the S.T. Lee grant project, which used GeoBlacklight to deliver index maps, and will expand the offerings to include all the types of data that are now included in HGL. The redesign will preserve existing discovery capabilities of geospatial data from non-Harvard repositories as well as reaffirm its commitment to the extensibility of data ingest and discovery from sources beyond the Library.
...
infrastructural capacity will place an emphasis on leveraging existing storage capabilities within the University (e.g., FAS RC) and consortial/commercial options outside the University (e.g., NESE, Iron Mountain). Conceptually, the DRS will now incorporate a storage broker architecture in which each file will be assigned a curatorially-designated storage classification that will control the variable degree of replication. New pricing will reflect the underlying provisioning/operating costs of the various storage options. Billing will reflect the differential prices of the various storage options utilized for a given file.
III. In Scope/Out of Scope
In Scope
Essential interface components
Authorization for restricted sets that doesn’t rely on AMS
Search of data using limits and facets on results
Relevance ranking and weighting - predefined
Index map display support
Index map facet for searching
Dataset preview on a map
Method to download vector and raster data as well as scanned maps
Method to download record metadata
Method to link back to individual record
Essential interoperability components
Method for providing a link from a HOLLIS record of single data layer to the single record in HGL
Method for providing a link from a HOLLIS record of a collection of data layers to a search result in HGL with all the data layers
Method for providing HGL records available in HOLLIS
Method for sending metadata records to OpenGeoMetadata (https://github.com/OpenGeoMetadata) on at least an intermittent basis
Preserve existing discovery capabilities of geospatial data from non-Harvard institutions and commitment to extensibility of data ingest and discovery beyond Harvard Library
Essential infrastructure components
Dev/QA/Prod servers running GeoBlacklight
Solr index with current HGL data in GeoBlacklight Schema
Supported storage for index map GeoJSON files
Method for depositing data into GeoServer - and determining which data types will be supported
Data deposit method that is extensible to new spatial data sources outside of the Map Collection
Method for having developers/designer commit changes to interface and view
Evaluate current version of HGL GeoServer for compatibility with required functionality in GeoBlacklight
Evaluate need for database tables used for data export and download
Evaluate GeoCombine as a tool for managing standardized GIS metadata - to inform data publishing decisions
Evaluate and document a dev upgrade path for GeoServer and, if needed, its implications for data migration
Out of Scope
Preserving shopping cart feature from current HGL/OGP that allows for the selection of multiple files for download
Decision on metadata format - FGDC vs ISO
Using persistent identifiers (URNs) for layer names and persistent links (URNs) in metadata
Preservation of vector data in DRS
Preservation of FGDC metadata in DRS
Automated method for sharing metadata records with OpenGeoMetadata
Web mapping services (WMS) and tile mapping services (TMS)
Determining methods for reducing tile cache storage size
GeoServer upgrade - unless it’s for a critical need
Relevance ranking and weighting - user defined
Autosuggest with related terms
Making multiple formats available for ingest and export (GeoJSON, Geodatabase, GeoPackage, CSV)
Making offline datasets discoverable
Making geospatial data from Dataverse available for search and delivery
...
An HGL solution that uses Harvard centralized systems for authentication and authorization of users who want to use licensed data sets.
A GeoBlacklight implementation of HGL that supports search, discovery, display, download and reuse of:
vector and raster datasets
georeferenced historical maps
index maps
An HGL solution that provides access to all data in the current HGL implementation
Supported and documented method for depositing data into HGL
Supported and documented method for storing new index map data for use in HGL
Supported and documented infrastructure for Dev/QA/Prod instances of HGL
Supported and documented methods for updates and upgrades to HGL components including GeoBlacklight,GeoServer, and Solr
Understanding of performance expectations related to rendering large historic maps
Evaluation of need for custom database tables to support integration with Alma and downloads of DRS files
Evaluation of GeoCombine as a tool for managing standardized GIS metadata - to inform data publishing decisions
Definition of “Done”
The HGL/GeoBlacklight project will be considered done when:
Stakeholders accept that in-scope work has been delivered
Operations team has the tools to support system deployments and upgrades
HGL with GeoBlacklight front-end are deployed to production and accessible to users
All current HGL data layers are discoverable and deliverable
Stakeholders accept plan for GeoServer upgrade
Documented plan to fully retire old HGL
...
- BatchBuilder
- WebAdmin
Essential interoperability components
- Starfish storage management infrastructure
Essential infrastructure components
- HUIT Research Computing managed storage infrastructure at Markley
- FAS RC storage infrastructure at MGHPCC
- AWS S3 Infrequent Access
- Tape infrastructure at NESE
- Tape warehousing at Iron Mountain
- Snowball devices for movement of deliverable content to S3
- Oracle DB
- Fscheckd monitoring script
- Quarterly billing script
Out of Scope
- Interoperability with storage at FAS RC's HPC cluster at MGHPCC
- Interoperability with USC DR
IV. Deliverables/Work Products
- HUIT Research Computing ECS at Markley
- FAS RC ECS at MGHPCC
- AWS S3 Infrequent Access
- Tape at NESE
- Tape at Iron Mountain
- Encryption of level 4 content into the tape environments
- Definition of OCFL structure for ECS file system and S3 object storage
- Retrospective classification of the storage class of all existing files
- Extension to DB schema
- BatchBuilder and WebAdmin support for storage (re)classification
- New pricing scheme based on partial cost recovery
- Revised quarterly billing script supporting non-uniform storage replication
- Updated BatchBuilder and WebAdmin online documentation and training material
- Switch to a different SFTP client and updates of depositor scripts to interact with ECS-hosted drop boxes; updates to "remote dropbox" setups that exist for Media Preservation, Imaging Services and Harvard Art Museums - related training, testing and documentation
Definition of "Done"
The DRS Refresh project will be considered done when:
- New hardware/software components are fully deployed
- Migration of all retrospective content to new storage environment
- Disposition of all prospective content to new storage environment
- BatchBuilder and WebAdmin updates are complete and training and documentation describe how to specify/update curatorial storage class attribute
- Dynamic storage reallocation (delete/copy as necessary) fully functional at point of initial deposit and on an ad hoc basis via WebAmin
- Fscheckd monitoring script re-written and pointed at all new storage options
- VPDR/LLT approval of new pricing scheme in consultation with stakeholders
- Quarterly billing script supports non-uniform storage replication
- Decommissioning and removal of old equipment
V. Stakeholders and Project Team
Stakeholders
Stakeholder | Title | Participation |
Stephen Abrams | Head of |
Business Sponsor and Service Owner
Marc McGee
Geospatial Metadata Librarian
Product owner and metadata
GeoSpatial Working Group
Advisory and testing support
Stu Snydman
Associate University Librarian and Managing Director for Library Technology
Advisory
...
digital preservation/ DRS business owner | Use cases, requirements, conceptual design | |
Stewardship Standing Committee | Review/comment | |
DRS collection managers | Review/comment/training | |
Harvard media and image digitization and preservation practitioners |
| Review/comment/test |
Project Team**
Team Member | Role(s) | Affiliation |
Enrique Diaz
Project Co-Manager & Scrum Master
Head of Design & Development, DSI, HL
Paul Aloisio
Project Co-Manager
Systems Librarian, LTS, HUIT
Phil Plencner
Software Engineer
Senior Developer, DSI, HL
Tom Scorpa
Operational Resources
Production Operations Lead, LTS, HUIT
Marc McGee
Metadata Analyst & Product Owner
ITS, HL
Scott Walker
Business Analyst
Robin Wendler
Metadata Consultant
LTS
...
Stephen Abrams | Business owner | DPS |
Tricia Patterson | Business owner | DPS |
Vitaly Zakuta | Project manager/Scrum master/Analyst | LTS |
Anthony Moulen | Architect | LTS |
Andrew Woods | Consultant | LTS |
Sharon Bayer | Infrastructure project manager | LTS |
Chris Vicary | Technical lead / software engineer | LTS |
David Neiman | Software engineer | LTS |
Jessica Jassal | Software engineer | LTS |
Valdeva Crema | Software engineer | LTS |
Tom Scorpa | ProdOps lead / storage manager / systems administrator | LTS |
Jason Knight | ProdOps / systems administrator | LTS |
Benson Smith | DB admin | LTS |
Julie Wetherill | Agile product owner / Analyst/ QA/ Documentation/training | LTS |
Janet Taylor | UI/UX (4/27-on) | LTS |
** Other team members may be added if work requires it
...
VI. Schedule
...
Phase | Phase Start | Phase End | Milestone | Milestone Date |
Planning |
3/ |
9 | 3/ |
29 | Project Charter approved by all stakeholders |
|
29 |
Preparation |
3/ |
30 |
4/ |
Development environment provisioned, configured and running; evaluations completed
development assessment (go/no-go)
1/19/2021
Development
1/19/2021
3/30/2021
Production-ready codebase ready for QA testing
12 | Technical design complete | 4/12 | ||
Development |
|
| All development tasks are complete |
|
Move to Production |
|
17 |
|
Check ProdOps for release schedule
...
30 | Move to production complete and accepted by stakeholders |
|
VII. Key tasks and outcomes
Tasks | Outcomes | Responsible Parties |
Approve Project Charter | Agree on Project Charter with regards to:
| Business Sponsor |
Meeting schedule | Sprint |
ceremonies | Project |
Manager | |
Project infrastructure |
|
| Project |
Manager, Tech Lead, Architect, ProdOps | ||
Development |
| Project Team |
Communication & Outreach planning |
|
CC article, newsletter submission, promotion
Project Co-Managers
Move new HGL to production
HGL with GeoBlacklight is the public interface for HGL.
OpenGeoportal interface shut down
Operational Resources and Project Team
...
| Project Manager; Business Owner | |
Move to production | Project Team |
VIII. Assumptions, Risks, and Constraints
Constraints
Cost: this project does not account for additional costs incurred by running multiple instances of beta and production in parallel
Assumptions
Stakeholders have identified the appropriate subject matter experts to participate in the project and who can accurately and completely define the business requirements for the project
Stakeholders will have made available the time required to participate in project activities and to complete tasks as requested
Project sponsor and other stakeholders are empowered to make the decision required for the project to be a success
Existing GeoServer implementation is compatible with newest version of GeoBlacklight
Risks
...
Risk: Reliance on legacy authorization system (AMS) .
Plan: Integrate with HarvardKey directly using methods already developed in recent LTS/DSI projects
Impact: Without authentication/authorization, restricted material would not be available for download
Owner: Software Engineer
...
- Service contracts on existing hardware expire in September 2021
- Updated software needs to be in production before start of Fall Semester 2021
Assumptions
- Prior completion of BatchBuilder Java upgrade
Risks
- Risk: Work extends beyond the August 31 expiration of existing hardware
- Plan:
- Impact:
- Owner:
Appendix
Definitions of Roles
- Business Owner - Provide vision and direction of product
- Product Owner - Define, prioritize, and accept work done for project
- Project Manager - Maintain project schedule and communication
- Scrum Master - Lead, guide, and assist project team through development work
- Business Analyst - Provide insight into user needs to inform and refine work stories
- Technical Lead -Lead technical design and development
- Architect - Provide technical architecture for the solution
- Software Engineer - Update and build software to accommodate new storage architecture
- Production Operations - Administer new storage solution system and provide insight into its operation
- DB Admin - Administer databases to accommodate needed functionality and facilitate needed changes
- UI/UX - Create wireframes / mockups of new/updated UI components, and provide guidance on usability of new/updated functionality
Glossary
- FAS RC – Faculty of Arts & Sciences Research Computing (https://www.rc.fas.harvard.edu/)
- MGHPCC – Massachusetts Green High-Performance Computing Center (https://www.mghpcc.org/)
- NESE – Northeast Storage Exchange (https://nese.mghpcc.org/)
- OCFL – Oxford Common File Layout (https://ocfl.io/)
- USC DR – University of Southern California Digital Repository (https://repository.usc.edu/)