Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Anchor
problem
problem
I. Problem/Value Statement

Problem Statement

The Harvard Geospatial Library (HGL) enables researchers to discover and easily access the wealth of geospatial data available to the Harvard community. Data sets are available from around the world at various scales, from global to local. Each data set is delivered with complete metadata, making it easier to add to a geographic information system (GIS) and compare to other data sets about the same place.

HGL currently uses OpenGeoportal (OGP), a platform that is no longer developed or supported. The platform has led to reliability and stability problems. It is also impossible to make any improvements to the HGL user interface because there are no developers who can work on the OGP source code.

LTS has also developed custom programs for loading data into HGL’s GeoServer, which stores and delivers the map data. After a necessary infrastructure change, the loading programs stopped working for an important category of material. Scanned maps can be loaded, but the process is still very cumbersome.

HGL relies on LTS’s Access Management Service (AMS) to provide authorized access to licensed data sets.   AMS is being retired.  Current systems are being re-engineered to use more centrally supported Harvard systems for authentication and authorization.  Sooner or later HGL will need to be required to also use the centrally supported authentication and authorization systems.

The Harvard Library intends to modernize its implementation of a geospatial data access & discovery layer, establish a sustainable workflow data loading, and make geospatial data downloadable.

Business Value

The work proposed here meets a long-standing list of requests made by students, researchers, faculty and stakeholders over the course of several years. This project will follow the recommendation of the Harvard Geospatial Working Group and transition HGL from the current open source platform, OpenGeoportal (OGP, developed at Tufts) to a new open source platform, GeoBlacklight (GBL, developed primarily at Stanford). Harvard will become an active participant in the GBL community of users, which includes many peer institutions, including 6 Ivy Plus members.

Creating a robust and sustainable environment through which maps and myriad forms of geospatial data can be discovered, explored and downloaded fulfills a core tenet of the Library’s mission, and remediates an unstable and outdated data ingest and solution. It is critical the Library leverages those resources to reduce the practical costs of ownership and development, and increase its viability as a consortial partner in the GIS scholarly community.

TBD

Business Value

TBD


Anchor
vision
vision
II. Vision and Approach

The redesign of HGL will use the open source GeoBlacklight platform and establish a development-to-production environment for HGL based on LTS protocols and standards. The project will build on the knowledge gained from the S.T. Lee grant project, which used GeoBlacklight to deliver index maps, and will expand the offerings to include all the types of data that are now included in HGL. The redesign will preserve existing discovery capabilities of geospatial data from non-Harvard repositories as well as reaffirm its commitment to the extensibility of data ingest and discovery from sources beyond the Library. TBD

Anchor
scope
scope
III. In Scope/Out of Scope

...

Essential interface components

  • Authorization for restricted sets that doesn’t rely on AMS

  • Search of data using limits and facets on results

  • Relevance ranking and weighting - predefined

  • Index map display support

  • Index map facet for searching

  • Dataset preview on a map

  • Method to download vector and raster data as well as scanned maps

  • Method to download record metadata

  • Method to link back to individual record

Essential interoperability components

  • Method for providing a link from a HOLLIS record of single data layer to the single record in HGL

  • Method for providing a link from a HOLLIS record of a collection of data layers to a search result in HGL with all the data layers

  • Method for providing HGL records available in HOLLIS

  • Method for sending metadata records to OpenGeoMetadata (https://github.com/OpenGeoMetadata) on at least an intermittent basis

  • Preserve existing discovery capabilities of geospatial data from non-Harvard institutions and commitment to extensibility of data ingest and discovery beyond Harvard Library

Essential infrastructure components

  • Dev/QA/Prod servers running GeoBlacklight

  • Solr index with current HGL data in GeoBlacklight Schema

  • Supported storage for index map GeoJSON files

  • Method for depositing data into GeoServer - and determining which data types will be supported

  • Data deposit method that is extensible to new spatial data sources outside of the Map Collection

  • Method for having developers/designer commit changes to interface and view

  • Evaluate current version of HGL GeoServer for compatibility with required functionality in GeoBlacklight

  • Evaluate need for database tables used for data export and download

  • Evaluate GeoCombine as a tool for managing standardized GIS metadata - to inform data publishing decisions

  • Evaluate and document a dev upgrade path for GeoServer and, if needed, its implications for data migration

Out of Scope

  • Preserving shopping cart feature from current HGL/OGP that allows for the selection of multiple files for download

  • Decision on metadata format - FGDC vs ISO 

  • Using persistent identifiers (URNs) for layer names and persistent links (URNs) in metadata

  • Preservation of vector data in DRS

  • Preservation of FGDC metadata in DRS

  • Automated method for sharing metadata records with OpenGeoMetadata

  • Web mapping services (WMS) and tile mapping services (TMS) 

  • Determining methods for reducing tile cache storage size

  • GeoServer upgrade - unless it’s for a critical need

  • Relevance ranking and weighting - user defined

  • Autosuggest with related terms

  • Making multiple formats available for ingest and export (GeoJSON, Geodatabase, GeoPackage, CSV) 

  • Making offline datasets discoverable 

  • Making geospatial data from Dataverse available for search and deliveryTBD

Essential interoperability components

  • TBD

Essential infrastructure components

  • TBD

Out of Scope

  • TBD

Anchor
deliverables
deliverables
IV. Deliverables/Work Products  

  • An HGL solution that uses Harvard centralized systems for authentication and authorization of users who want to use licensed data sets.

  • A GeoBlacklight implementation of HGL that supports search, discovery, display, download and reuse of:

    • vector and raster datasets

    • georeferenced historical maps

    • index maps

  • An HGL solution that provides access to all data in the current HGL implementation

  • Supported and documented method for depositing data into HGL

  • Supported and documented method for storing new index map data for use in HGL

  • Supported and documented infrastructure for Dev/QA/Prod instances of HGL

  • Supported and documented methods for updates and upgrades to HGL components including GeoBlacklight,GeoServer, and Solr

  • Understanding of performance expectations related to rendering large historic maps  

  • Evaluation of need for custom database tables to support integration with Alma and downloads of DRS files

  • Evaluation of GeoCombine as a tool for managing standardized GIS metadata - to inform data publishing decisions

  • TBD

Definition of “Done”

The HGL/GeoBlacklight DRS Refresh project will be considered done when:

  • Stakeholders accept that in-scope work has been delivered

  • Operations team has the tools to support system deployments and upgrades

  • HGL with GeoBlacklight front-end are deployed to production and accessible to users

  • All current HGL data layers are discoverable and deliverable

  • Stakeholders accept plan for GeoServer upgrade 

  • Documented plan to fully retire old HGL

  • TBD

Anchor
teams
teams
V. Stakeholders and Project Team

...

Stakeholder

Title

Participation

Bonnie Burns

Head of Geospatial Resources, Harvard Map Collection

Business Sponsor and Service Owner

Marc McGee

Geospatial Metadata Librarian

Product owner and metadata 

GeoSpatial Working Group

Advisory and testing support

Stu Snydman

Associate University Librarian and Managing Director for Library Technology

Advisory

TBD



TBD



TBD



TBD




Project Team*

Team Member

Role(s)

Affiliation

Enrique Diaz

Project Co-Manager & Scrum Master

Head of Design & Development, DSI, HL

Paul Aloisio

Project Co-Manager

Systems Librarian, LTS, HUIT

Phil Plencner

Software Engineer

Senior Developer, DSI, HL

Tom Scorpa

Operational Resources

Production Operations Lead, LTS, HUIT

Marc McGee

Metadata Analyst & Product Owner

ITS, HL

Scott Walker

Business Analyst

Robin Wendler

Metadata Consultant

LTS























* Other team members may be added if work requires it

...

Phase

Phase Start

Phase End

Milestone

Milestone Date

Planning

12/8/2020

Charter approved

12/8/2020





Preparation

12/8/2020

1/19/2021

Development environment provisioned, configured and running; evaluations completed

development assessment (go/no-go)

1/19/2021





Development1/19/2021

3/30/2021

Production-ready codebase ready for QA testing

3/30/2021





Move to Production

3/30/2021

4/13/2021

Check ProdOps for release schedule






Anchor
outcomes
outcomes
VII. Key tasks and outcomes

Tasks

Outcomes

Responsible Parties

Approve Project Charter

Agree on Project Charter with regards to:

  • Stakeholders

  • Scope

  • Deliverables

  • Schedule

Business Sponsor

Meeting schedule

Sprint ceremonies 

Project Co-Managers

Project infrastructure

  • Populate Jira project board

  • Set up wiki page for LTS Operations 

  • Set up dev/qa environments

  • Provision code repository

Project Co-Managers and Business Owner

Operational Resources

Development

Implementation of user stories

  • Based on scope and deliverables from charter

  • Reviewed and accepted by Product Owner

Project Team

Communication & Outreach planning

  • Demo Demos to stakeholders, GeoSpatial Working Group, campus communityCC article, newsletter submission, promotion

  • Email communication

  • Live updates to stakeholders (monthly?)

Project Co-Managers

Move new HGL to production

  • HGL with GeoBlacklight is the public interface for HGL.

  • OpenGeoportal interface shut down


    Operational Resources and Project Team

    ...

    Anchor
    assumptions
    assumptions
    VII. Assumptions, Risks, and Constraints

    Constraints

    • Cost: this project does not account for additional costs incurred by running multiple instances of beta and production in parallel

    Assumptions

    • Stakeholders have identified the appropriate subject matter experts to participate in the project and who can accurately and completely define the business requirements for the project

    • Stakeholders will have made available the time required to participate in project activities and to complete tasks as requested

    • Project sponsor and other stakeholders are empowered to make the decision required for the project to be a success

    • Existing GeoServer implementation is compatible with newest version of GeoBlacklight 

    Risks

    • Risk: New version of GeoServer may be necessary for essential in-scope GeoBlacklight functionality, or security audit failure, incurring additional costs
      Plan: Consult with community expertise; evaluate current GeoServer compatibility by generating a reference set of data to test against it
      Impact: New version will introduce unknowns around data migration, requiring new plans, adding operations costs, and impeding project velocity.
      Owner: Project team, Business Sponsor
      Risk: New GeoBlacklight schema is released during this project’s development sprints with critical changes necessary for in-scope functionality
      Plan: Find tools that can convert to new schema; if they don’t exist, we either build the tools or evaluate eliminating some in-scope project requirements
      Impact: New specifications in a new GeoBlacklight schema required for in-scope functionality would require either reallocation of time and resources to address schema conversion or eliminating project must-haves affected by the change.
      Owner: Metadata Analyst
      Risk: New version of GeoBlacklight with critical security fix released over course of project
      Plan: Install and test new version in development, assess severity of issues, if any
      Impact: Project cannot be deployed to production until security vulnerabilities are addressed
      Owner: Software Engineer
      Risk: OpenGeoportal (OGP) currently running on Java 8, poses ongoing stability issues, could disrupt HGL availability
      Plan: Replace OGP with GeoBlacklight during this project; rely on in-house experience to remediate outages and evaluate their severity in the interim
      Impact: This is an existing risk, independent of this project. Its impact would disrupt the production instance of HGL, with the level of severity diminishing until ultimately eliminated once the project is completed.
      Owner: Project Co-Managers
      Risk: Reliance on legacy authorization system (AMS) .
      Plan:
      Integrate with HarvardKey directly using methods already developed in recent LTS/DSI projects
      Impact:
      Without authentication/authorization, restricted material would not be available for download
      Owner:
      Software EngineerTBD

    Assumptions

    • TBD

    Risks

    • Risk:
      Plan:
      Impact:
      Owner:


    Anchor
    appendix
    appendix
    Appendix

    ...