Metadata Optimization Project

Project purpose and goals

Library services rely on accurate metadata  to ensure that faculty, students, and researchers can discover, access, and use library materials needed for research and education.  Library metadata standards, strategies, and schemas are undergoing dramatic change due to the changing nature of library collections, data creation workflows, and evolving expectations for interoperability and reuse of metadata.  As Harvard Library embarks on a migration to a new library processing platform, the Access and Discovery Standing Committee spearheaded a project to enhance and upgrade metadata in Aleph, SFX, Verde to achieve two key objectives: 

1) optimize the ease and accuracy of metadata migration from Aleph to Alma

2) increase the quality and efficacy of resource sharing for patrons at Harvard and beyond 


 Alma MigrationOCLC Synchronization

Other Corrections

 (most will use automated scripts)

Target date for subproject completion  Dec. 2017Aug. 2017 Dec. 2017 
No. error conditions identified68  130 292
No. error conditions defined as priorities  100 
No. error conditions resolved 4 4 0
Estimated no. records impactedunknown 300,000 Some error conditions will impact all bibliographic and holdings records (approx. 30 million) 
Target for no. of record corrections  400,000 90,000
No. record corrections completed 11,00053,400 

0

Revised forecast for no. corrections  by end of project 400,00090,000 30 million records (all)


Weekly Update & Dashboard

June 28, 2017

Estimated number of Aleph bibliographic records to be corrected: 300,000

Modifications to Date

TypeEst. Count% by Batch
Bibliographic

1,840,942

99
Holding

  9,946

98
Item  1,546 62
Orders   1034

 96

 

In the works:

See prior updates and project status for more details.

 

Phase / IssueStatus / % CompleteEst. No. of RecordsAleph Records Corrected
1: Paired field errors (subfield $$6) 

DATA CORRECTION

84%

Bibs: 29,396

25,749

1: 'Sparse' records 

DATA CORRECTION

51%

Bibs: 21,538

11,144

1: Missing/Problematic titles (245)

COMPLETE

98.5%

Bibs: 855

842

1: Invalid identifiers PLANNING & ANALYSIS Bibs: > complete db checkBibs: 0
1: Form of item conflict: Invalid combination LDR/06 & LDR/07

PLANNING & ANALYSIS 

 

Bibs: 8,071 (revised est.)

 

8,291

1: Form of item conflict: Non-standard or obsolete codes in LDR/06 (Record Type)

PLANNING & ANALYSIS

29.28% 

Bibs: 15,413

4,578

1:Form of item conflict: Non-standard or obsolete codes in 008/23 (BK) PLANNING & ANALYSISBibs: 87 0
1:Form of item conflict: Non-standard or obsolete codes in 008/23 (CF) PLANNING & ANALYSISBibs: 61 0
1:Form of item conflict: Non-standard or obsolete codes in 008/23 (MX) PLANNING & ANALYSISBibs: 3 0
 1:Form of item conflict: Non-standard or obsolete codes in 008/23 (MX) PLANNING & ANALYSISBibs: 12 0
1:Form of item conflict: Non-standard or codes in 008/29 (VM)

 DATA CORRECTION

Bibs: 42,229 0
1:Form of item conflict: Non-standard or obsolete codes in 008/29 (MP) DATA CORRECTIONBibs: 20 0
Alma: Active holdings on deleted/suppressed bibs 

DATA CORRECTION

Bibs: 24.86%

Hol:  17.66%

Holdings: 15,575

Bibs: 16,031


2751 / 3625

1: Set lending byte PLANNING & ANALYSIS

Holdings

13,588,541 - Unknown policy

      514,505 - Invalid values

0
1: Set reproduction byte  PLANNING & ANALYSIS

Holdings

13,669,916 – Unknown policy

     515,184 – Invalid values

0
1: Set retention byte PLANNING & ANALYSIS

Holdings

  140,442 – Unknown policy

9,045,492 - Invalid values

0

0: LDR - Invalid length

DATA CORRECTION

95%

Bibs: 376

348

0: LDR/05 (Record status) - Invalid codes

COMPLETE 90.49%

Bibs: 10,551

9,548

0: Orphan bibs

PLANNING & ANALYSIS

4.75%

Bibs: 41,000

1,949

0:008 (Fixed field) missing

COMPLETE

100%

Bibs: 36

36

0: 008 (Fixed field) - Invalid length

 DATA CORRECTION

11%

Bibs: 918102
0: 008/06 (Date Type) -  Invalid codes

 DATA CORRECTION

97%

Bibs: 20,398

19,786

0: 008/15-17 (Language - Invalid & obsolete codes) - All formats except music/sound

 DATA CORRECTION

27%

Active Bibs: 8,554

 2,270
0: 008/15-17 (Language - Invalid codes or blank) - Sound recordings

DATA CORRECTION

72%

Bibs: 30,256  21,683
0: 008/21 (Music Parts) - Invalid codes PENDING HLBibs: 1,146 (1,143 have obsolete code 'a')
0: 008/33 LitF (BK)  - Invalid codes PLANNING & ANALYSISActive Bibs: 21,858 contain invalid codes 1217 contain an obsolete code 0
0: 008/33 Alph (SE)  - Invalid codes PLANNING & ANALYSISActive Bibs:  406 0
0: 008/33 TMat (VM) - Invalid codesPLANNING & ANALYSISActive and suppressed bibs: 42,257 contain invalid or obsolete codes 0
0: 008/35-37 Place of publication - Invalid & obsolete codesPLANNING & ANALYSIS

Active Bibs: 152,697

 0
0: 260 Imprint missingPLANNING & ANALYSISJudaica Manual by unit
0: Obsolete 440 convert series statement to 490/830

COMPLETE

100%

Bibs: 1,161,189, Approx. 1.2m tags

1,724,769

Alma: Item Processing Status - invalid codes 

 COMPLETE

100%

Items: 1

 1

Alma: Item Status - invalid codes

 PENDING HL

19%

Items: 22643
Alma: Items: Material Type - invalid codes

 COMPLETE

100%

Items: 930

929

 

Alma: Order Status - invalid codes

 COMPLETE

100%

Orders: 1010

1,010

Alma: Acquisition Method - invalid codes COMPLETE

100%

Orders:  23 23
Alma: Open Orders

Orders to be closed to be identified by ASWG as part of test load process.

 
Alma: RLN conversion

Holdings: 10,260. Debbie corrected 9,263 holdings. She will forward list of holdings that could not be resolved to contact at sublibraries for resolution.9,263
Alma: LDR07 with Harvard-defined '9'

 COMPLETE

100%

Bibs: 17 17

Project Approach

This project is funded by the Harvard Library and will be managed by a project team from Library Technology Services. The project team will gather information from Harvard Library staff, the OCLC Data Sync Working Group, and Metadata Standards Working Group about current and historic coding practices and workflows in Aleph, SFX and Verde to identify and recommend metadata candidates for normalization and improvement, define and recommend remediation objectives and strategies, and complete remediation of prioritized data issues.

An oversight committee representing Library Technology Services, Access & Discovery Standing Committee, and Harvard Library ITS and Access Services will prioritize work and approve data mediation strategies.

 Objectives, strategies and recommendations will be informed by information gathering from experts across Harvard Library.  

  • Gather information on current and historic data and workflows
  • Conduct metadata analysis to identify candidates for correction
  • Prioritize candidate data elements to be corrected
  • Design and execute data correction processes - use automated processing as much as possible

Deliverables

  • A list of key areas to correct or improve, prioritized by impact on users
  • A communication strategy and dashboard to monitor progress
  • Final summary report

The project will be complete when accepted recommendations have been implemented and correction projects defined as highest priority have been completed.

Project Team

Lynn Stram, Metadata Migration Analyst, Library Technology Services (lead)
Corinna Baksik, Library Technology Services
Michael Edwards, Library Technology Services
Laura Morse, Library Technology Services
Allison Powers, Library Technology Services 
Additional Term Project Resources (tbd, Library Technology Services, Information and Technical Services) 

Oversight Committee

Michelle Durocher, Information and Technical Services
Laura Morse, Library Technology Services
Ken Peterson, Access Services
Tracey Robinson, Library Technology Services
Scott Wicks, Information and Technical Services
Suzanne Wones, Harvard Library

Timeline

A preliminary schedule has been created, but may need adjustment based on outcome of the analysis, hiring patterns for project staff, and dependencies from the OCLC Data Synchronization Project: 

Information gathering, data review, development of database remediation project list.   May 2016 – September 2016

Finalize priorities for remediation, communication plan, and dashboard.  October 2016

Determine options for data remediation initiatives. November – December 2016

Execute data remediation.  December 2016 – April 2017

Prepare final report.  April  2017 – May 2017

Charter

Metadata Optimization Charter

Resource Links 

PowerPoint Slides from Public Presentations (October 2016)

 

Contact Team Members:

mop@hulmail.harvard.edu

Milestone Timeline

May-September 2016Information Gathering
October 2016Finalize Phase 1 Priorities
November - December - 2016Define and Execute Phase 1 Remediation Options
January 2017Finalize Phase 2 Priorities
February 2017Define and Execute Phase 2 Remediation Options
February 2017Finalize Phase 3 Priorities
May 2017Define and Execute Phase 3 Remediation Options
May 2017Finalize Phase 4 Priorities
April-May 2017 Final Report
June 2017Define and Execute Phase 4 Remediation Options