Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »


I. Problem/Value Statement

Problem Statement

The Harvard Geospatial Library (HGL) enables researchers to discover and easily access the wealth of geospatial data available to the Harvard community. Data sets are available from around the world at various scales, from global to local. Each data set is delivered with complete metadata, making it easier to add to a geographic information system (GIS) and compare to other data sets about the same place.


HGL currently uses OpenGeoportal (OGP), a platform that is no longer developed or supported. The platform has led to reliability and stability problems. It is also impossible to make any improvements to the HGL user interface because there are no developers who can work on the OGP source code.


LTS has also developed custom programs for loading data into HGL’s GeoServer, which stores and delivers the map data. After a necessary infrastructure change, the loading programs stopped working for an important category of material. Scanned maps can be loaded, but the process is still very cumbersome.


HGL relies on LTS’s Access Management Service (AMS) to provide authorized access to licensed data sets.   AMS is being retired.  Current systems are being re-engineered to use more centrally supported Harvard systems for authentication and authorization.  Sooner or later HGL will need to be required to also use the centrally supported authentication and authorization systems.


The Harvard Library intends to modernize its implementation of a geospatial data access & discovery layer, establish a sustainable workflow data loading, and make geospatial data downloadable.

Business Value

The work proposed here meets a long-standing list of requests made by students, researchers, faculty and stakeholders over the course of several years. This project will follow the recommendation of the Harvard Geospatial Working Group and transition HGL from the current open source platform, OpenGeoportal (OGP, developed at Tufts) to a new open source platform, GeoBlacklight (GBL, developed primarily at Stanford). Harvard will become an active participant in the GBL community of users, which includes many peer institutions, including 6 Ivy Plus members.


Creating a robust and sustainable environment through which maps and myriad forms of geospatial data can be discovered, explored and downloaded fulfills a core tenet of the Library’s mission, and remediates an unstable and outdated data ingest and solution. It is critical the Library leverages those resources to reduce the practical costs of ownership and development, and increase its viability as a consortial partner in the GIS scholarly community.


II. Vision and Approach

The redesign of HGL will use the open source GeoBlacklight platform and establish a development-to-production environment for HGL based on LTS protocols and standards. The project will build on the knowledge gained from the S.T. Lee grant project, which used GeoBlacklight to deliver index maps, and will expand the offerings to include all the types of data that are now included in HGL. The redesign will preserve existing discovery capabilities of geospatial data from non-Harvard repositories as well as reaffirm its commitment to the extensibility of data ingest and discovery from sources beyond the Library. 

III. In Scope/Out of Scope

In Scope

Essential interface components

  • Authorization for restricted sets that doesn’t rely on AMS

  • Search of data using limits and facets on results

  • Relevance ranking and weighting - predefined

  • Index map display support

  • Index map facet for searching

  • Dataset preview on a map

  • Method to download vector and raster data as well as scanned maps

  • Method to download record metadata

  • Method to link back to individual record

Essential interoperability components

  • Method for providing a link from a HOLLIS record of single data layer to the single record in HGL

  • Method for providing a link from a HOLLIS record of a collection of data layers to a search result in HGL with all the data layers

  • Method for providing HGL records available in HOLLIS

  • Method for sending metadata records to OpenGeoMetadata (https://github.com/OpenGeoMetadata) on at least an intermittent basis

  • Preserve existing discovery capabilities of geospatial data from non-Harvard institutions and commitment to extensibility of data ingest and discovery beyond Harvard Library

Essential infrastructure components

  • Dev/QA/Prod servers running GeoBlacklight

  • Solr index with current HGL data in GeoBlacklight Schema

  • Supported storage for index map GeoJSON files

  • Method for depositing data into GeoServer - and determining which data types will be supported

  • Data deposit method that is extensible to new spatial data sources outside of the Map Collection

  • Method for having developers/designer commit changes to interface and view

  • Evaluate current version of HGL GeoServer for compatibility with required functionality in GeoBlacklight

  • Evaluate need for database tables used for data export and download

  • Evaluate GeoCombine as a tool for managing standardized GIS metadata - to inform data publishing decisions

  • Evaluate and document a dev upgrade path for GeoServer and, if needed, its implications for data migration

Out of Scope

  • Preserving shopping cart feature from current HGL/OGP that allows for the selection of multiple files for download

  • Decision on metadata format - FGDC vs ISO 

  • Using persistent identifiers (URNs) for layer names and persistent links (URNs) in metadata

  • Preservation of vector data in DRS

  • Preservation of FGDC metadata in DRS

  • Automated method for sharing metadata records with OpenGeoMetadata

  • Web mapping services (WMS) and tile mapping services (TMS) 

  • Determining methods for reducing tile cache storage size

  • GeoServer upgrade - unless it’s for a critical need

  • Relevance ranking and weighting - user defined

  • Autosuggest with related terms

  • Making multiple formats available for ingest and export (GeoJSON, Geodatabase, GeoPackage, CSV) 

  • Making offline datasets discoverable 

  • Making geospatial data from Dataverse available for search and delivery

IV. Deliverables/Work Products  

  • An HGL solution that uses Harvard centralized systems for authentication and authorization of users who want to use licensed data sets.

  • A GeoBlacklight implementation of HGL that supports search, discovery, display, download and reuse of:

    • vector and raster datasets

    • georeferenced historical maps

    • index maps

  • An HGL solution that provides access to all data in the current HGL implementation

  • Supported and documented method for depositing data into HGL

  • Supported and documented method for storing new index map data for use in HGL

  • Supported and documented infrastructure for Dev/QA/Prod instances of HGL

  • Supported and documented methods for updates and upgrades to HGL components including GeoBlacklight,GeoServer, and Solr

  • Understanding of performance expectations related to rendering large historic maps  

  • Evaluation of need for custom database tables to support integration with Alma and downloads of DRS files

  • Evaluation of GeoCombine as a tool for managing standardized GIS metadata - to inform data publishing decisions

Definition of “Done”

The HGL/GeoBlacklight project will be considered done when:

  • Stakeholders accept that in-scope work has been delivered

  • Operations team has the tools to support system deployments and upgrades

  • HGL with GeoBlacklight front-end are deployed to production and accessible to users

  • All current HGL data layers are discoverable and deliverable

  • Stakeholders accept plan for GeoServer upgrade 

  • Documented plan to fully retire old HGL

V. Stakeholders and Project Team

Stakeholders

Stakeholder

Title

Participation

Bonnie Burns

Head of Geospatial Resources, Harvard Map Collection

Business Sponsor and Service Owner

Marc McGee

Geospatial Metadata Librarian

Product owner and metadata 

GeoSpatial Working Group


Advisory and testing support

Stu Snydman

Associate University Librarian and Managing Director for Library Technology

Advisory


Project Team*

Team Member

Role(s)

Affiliation

Enrique Diaz

Project Co-Manager & Scrum Master

Head of Design & Development, DSI, HL

Paul Aloisio

Project Co-Manager

Systems Librarian, LTS, HUIT

Phil Plencner

Software Engineer

Senior Developer, DSI, HL

Tom Scorpa

Operational Resources

Production Operations Lead, LTS, HUIT

Marc McGee

Metadata Analyst & Product Owner

ITS, HL

Scott Walker

Business Analyst


Robin Wendler

Metadata Consultant

LTS


* Other team members may be added if work requires it


VI. Schedule

Phase

Phase Start

Phase End

Milestone

Milestone Date

Planning


12/8/2020

Charter approved

12/8/2020

Preparation

12/8/2020

1/19/2021

Development environment provisioned, configured and running; evaluations completed

development assessment (go/no-go)

1/19/2021

Development

1/19/2021

3/30/2021

Production-ready codebase ready for QA testing

3/30/2021

Move to Production

3/30/2021

4/13/2021


Check ProdOps for release schedule


VII. Key tasks and outcomes

Tasks

Outcomes

Responsible Parties

Approve Project Charter

Agree on Project Charter with regards to:

  • Stakeholders

  • Scope

  • Deliverables

  • Schedule

Business Sponsor

Meeting schedule

Sprint ceremonies 

Project Co-Managers

Project infrastructure

  • Populate Jira project board

  • Set up wiki page for LTS Operations 

  • Set up dev/qa environments

  • Provision code repository

Project Co-Managers and Business Owner

Operational Resources

Development

Implementation of user stories

  • Based on scope and deliverables from charter

  • Reviewed and accepted by Product Owner

Project Team

Communication & Outreach planning

  • Demo to stakeholders, GeoSpatial Working Group, campus community

  • CC article, newsletter submission, promotion

Project Co-Managers

Move new HGL to production

  • HGL with GeoBlacklight is the public interface for HGL.

  • OpenGeoportal interface shut down

Operational Resources and Project Team


VII. Assumptions, Risks, and Constraints

Constraints

  • Cost: this project does not account for additional costs incurred by running multiple instances of beta and production in parallel

Assumptions

  • Stakeholders have identified the appropriate subject matter experts to participate in the project and who can accurately and completely define the business requirements for the project

  • Stakeholders will have made available the time required to participate in project activities and to complete tasks as requested

  • Project sponsor and other stakeholders are empowered to make the decision required for the project to be a success

  • Existing GeoServer implementation is compatible with newest version of GeoBlacklight 

Risks

  • Risk: New version of GeoServer may be necessary for essential in-scope GeoBlacklight functionality, or security audit failure, incurring additional costs
    Plan: Consult with community expertise; evaluate current GeoServer compatibility by generating a reference set of data to test against it
    Impact: New version will introduce unknowns around data migration, requiring new plans, adding operations costs, and impeding project velocity.
    Owner: Project team, Business Sponsor

  • Risk: New GeoBlacklight schema is released during this project’s development sprints with critical changes necessary for in-scope functionality
    Plan: Find tools that can convert to new schema; if they don’t exist, we either build the tools or evaluate eliminating some in-scope project requirements
    Impact: New specifications in a new GeoBlacklight schema required for in-scope functionality would require either reallocation of time and resources to address schema conversion or eliminating project must-haves affected by the change.
    Owner: Metadata Analyst

  • Risk: New version of GeoBlacklight with critical security fix released over course of project
    Plan: Install and test new version in development, assess severity of issues, if any
    Impact: Project cannot be deployed to production until security vulnerabilities are addressed
    Owner: Software Engineer

  • Risk: OpenGeoportal (OGP) currently running on Java 8, poses ongoing stability issues, could disrupt HGL availability
    Plan: Replace OGP with GeoBlacklight during this project; rely on in-house experience to remediate outages and evaluate their severity in the interim
    Impact: This is an existing risk, independent of this project. Its impact would disrupt the production instance of HGL, with the level of severity diminishing until ultimately eliminated once the project is completed.
    Owner: Project Co-Managers

  • Risk: Reliance on legacy authorization system (AMS) .
    Plan:
    Integrate with HarvardKey directly using methods already developed in recent LTS/DSI projects
    Impact:
    Without authentication/authorization, restricted material would not be available for download
    Owner:
    Software Engineer


Appendix

Definitions of Roles


  • Business Owner - Provide vision and direction of product
  • Product Owner - Define, prioritize, and accept work done for project
  • Project Manager - Maintain project schedule and communication
  • Scrum Master - Lead, guide, and assist project team through development work
  • Business Analyst - Provide insight into user needs to inform and refine work stories
  • No labels