I. Problem/Value Statement
Problem Statement
The current DRS storage infrastructure (disk and tape) is in the final year of its service term and must be replaced. At the same time, DRS business owners would like to expand the range of storage options (e.g., cloud, external, etc.) and provide greater flexibility in replication policy.
Business Value
This storage refresh is critical for maintaining continuity of DRS service. It will also have the benefit of anticipated lowering of provisioning/operational costs, which then can be reflected in reduced DRS pricing. Support for an expansion of the range of storage options will streamline future incorporation of new technical solutions. It also enables implementation of policy-driven replication that provides better alignment of curatorially-designated value and goals with the technical characteristics of various storage components that best ensure those goals. This in turn will reduce overall costs (by minimizing the number and type of replicas) and maximize use of finite resources (space no longer used for a copy of A can now be used for a copy of B).
II. Vision and Approach
The provisioning of infrastructural capacity will place an emphasis on leveraging existing storage capabilities within the University (e.g., FAS RC) and consortial/commercial options outside the University (e.g., NESE, Iron Mountain). Conceptually, the DRS will now incorporate a storage broker architecture in which each file will be assigned a curatorially-designated storage classification that will control the variable degree of replication. New pricing will reflect the underlying provisioning/operating costs of the various storage options. Billing will reflect the differential prices of the various storage options utilized for a given file.
III. In Scope/Out of Scope
In Scope
Essential interface components
- BatchBuilder
- WebAdmin
Essential interoperability components
- Starfish storage management infrastructure
Essential infrastructure components
- HUIT Research Computing managed storage infrastructure at Markley
- FAS RC storage infrastructure at MGHPCC
- AWS S3 Infrequent Access
- Tape infrastructure at NESE
- Tape warehousing at Iron Mountain
- Snowball devices for movement of deliverable content to S3
- Oracle DB
- Fscheckd monitoring script
- Quarterly billing script
Out of Scope
- Interoperability with storage at FAS RC's HPC cluster at MGHPCC
- Interoperability with USC DR
IV. Deliverables/Work Products
- HUIT Research Computing ECS at Markley
- FAS RC ECS at MGHPCC
- AWS S3 Infrequent Access
- Tape at NESE
- Tape at Iron Mountain
- Encryption of level 4 content into the tape environments
- Definition of OCFL structure for ECS file system and S3 object storage
- Retrospective classification of the storage class of all existing files
- Extension to DB schema
- BatchBuilder and WebAdmin support for storage (re)classification
- New pricing scheme based on partial cost recovery
- Revised quarterly billing script supporting non-uniform storage replication
- Updated BatchBuilder and WebAdmin online documentation and training material
- Switch to a different SFTP client and updates of depositor scripts to interact with ECS-hosted drop boxes; updates to "remote dropbox" setups that exist for Media Preservation, Imaging Services and Harvard Art Museums - related training, testing and documentation
Definition of "Done"
The DRS Refresh project will be considered done when:
- New hardware/software components are fully deployed
- Migration of all retrospective content to new storage environment
- Disposition of all prospective content to new storage environment
- BatchBuilder and WebAdmin updates are complete and training and documentation describe how to specify/update curatorial storage class attribute
- Dynamic storage reallocation (delete/copy as necessary) fully functional at point of initial deposit and on an ad hoc basis via WebAmin
- Fscheckd monitoring script re-written and pointed at all new storage options
- VPDR/LLT approval of new pricing scheme in consultation with stakeholders
- Quarterly billing script supports non-uniform storage replication
- Decommissioning and removal of old equipment
V. Stakeholders and Project Team
Stakeholders
Stakeholder | Title | Participation |
Stephen Abrams | Head of digital preservation/ DRS business owner | Use cases, requirements, conceptual design |
Stewardship Standing Committee | Review/comment | |
DRS collection managers | Review/comment/training | |
Harvard media and image digitization and preservation practitioners |
| Review/comment/test |
Project Team**
Team Member | Role(s) | Affiliation |
Stephen Abrams | Business owner | DPS |
Tricia Patterson | Business owner | DPS |
Vitaly Zakuta | Project manager/Scrum master/Analyst | LTS |
Anthony Moulen | Architect | LTS |
Andrew Woods | Consultant | LTS |
Sharon Bayer | Infrastructure project manager | LTS |
Chris Vicary | Technical lead / software engineer | LTS |
David Neiman | Software engineer | LTS |
Jessica Jassal | Software engineer | LTS |
Valdeva Crema | Software engineer | LTS |
Tom Scorpa | ProdOps lead / storage manager / systems administrator | LTS |
Jason Knight | ProdOps / systems administrator | LTS |
Benson Smith | DB admin | LTS |
Julie Wetherill | Agile product owner / Analyst/ QA/ Documentation/training | LTS |
Janet Taylor | UI/UX (4/27-on) | LTS |
** Other team members may be added if work requires it
VI. Schedule
Phase | Phase Start | Phase End | Milestone | Milestone Date |
Planning | 3/9 | 3/29 | Project Charter approved by all stakeholders |
|
Preparation | 3/30 | 4/12 | Technical design complete | 4/12 |
Development |
|
| All development tasks are complete |
|
Move to Production |
|
| Move to production complete and accepted by stakeholders |
|
VII. Key tasks and outcomes
Tasks | Outcomes | Responsible Parties |
Approve Project Charter | Agree on Project Charter with regards to:
| Business Sponsor |
Meeting schedule | Sprint ceremonies | Project Manager |
Project infrastructure |
| Project Manager, Tech Lead, Architect, ProdOps |
Development |
| Project Team |
Communication & Outreach planning |
| Project Manager; Business Owner |
Move to production | Project Team |
VIII. Assumptions, Risks, and Constraints
Constraints
- Service contracts on existing hardware expire in September 2021
- Updated software needs to be in production before start of Fall Semester 2021
Assumptions
- Prior completion of BatchBuilder Java upgrade
Risks
- Risk: Work extends beyond the August 31 expiration of existing hardware
- Plan:
- Impact:
- Owner:
Appendix
Definitions of Roles
- Business Owner - Provide vision and direction of product
- Product Owner - Define, prioritize, and accept work done for project
- Project Manager - Maintain project schedule and communication
- Scrum Master - Lead, guide, and assist project team through development work
- Business Analyst - Provide insight into user needs to inform and refine work stories
- Technical Lead -Lead technical design and development
- Architect - Provide technical architecture for the solution
- Software Engineer - Update and build software to accommodate new storage architecture
- Production Operations - Administer new storage solution system and provide insight into its operation
- DB Admin - Administer databases to accommodate needed functionality and facilitate needed changes
- UI/UX - Create wireframes / mockups of new/updated UI components, and provide guidance on usability of new/updated functionality
Glossary
- FAS RC – Faculty of Arts & Sciences Research Computing (https://www.rc.fas.harvard.edu/)
- MGHPCC – Massachusetts Green High-Performance Computing Center (https://www.mghpcc.org/)
- NESE – Northeast Storage Exchange (https://nese.mghpcc.org/)
- OCFL – Oxford Common File Layout (https://ocfl.io/)
- USC DR – University of Southern California Digital Repository (https://repository.usc.edu/)