Alma Matching Algorithms

Overview

For general duplicate detection (e.g. manually finding matches in the institutional repository by using MDE Find Matches), we are using Alma's Title Statement Extended Fuzzy Matching, defined below. The Ex Libris documentation for all of the match methods is available here.

Note that:

  • For exports from Connexion, matching is based solely on OCLC number in 035. This was a Harvard configuration decision. 
  • For data loads (Alma import profiles), each profile is defined separately and should uses the Alma matching algorithm most appropriate for that data load. 

Title Statement Extended Fuzzy

SequenceCriteriaYear (008 Date 1)Format LDR/06Next step
First attempt at match

Match on one or more of:

  • 022 ISSN
  • 020 ISBN
  • 010 LCCN
  • CODEN
  • OCLC unique number
  • Other system number (035 field)
Must be within 1 yearMust match

If no match is found after the first attempt, system proceeds to next step

Note: if each record has an identifier and they do not match, or one lacks an identifier, process continues to next step.

SecondAuthor (1) + title (2)Must be within 1 yearMust matchIf record has no author, skip to next step
ThirdTitle (2)Must be within 1 yearMust matchNone

 

(1) Author criteria: 100 a-d,jq,u; 110 a-e,n,u; 111 a,c-e,n,q,u; 700 a-d,jq,u; 710 a-e,I,n,u; 711 a,c-e,I,j,n,q,u

(2) Title criteria: 245 a,b,k,n,p (including 880/245)

For titles, there will be no "match within." For example, these two titles will not match:

245 $a Paradiso / $c ...
245 $a Paradiso $b : I-XVII : edizone critica ....

because the entirety of 245 abknp for each do not match exactly.

Extended Fuzzy

Same functionality as Title Statement Extended Fuzzy, except that title matching includes (+ 880): 

  • 245 a
  • 245 a,b,k,n,p
  • 210 a
  • 246 a

ISBN match method (exact subfield match)

The Ex Libris documentation for this match method is not terribly clear. This is how it works: 

  • 020 $a of Alma record will match to 020 $a of the incoming record
  • 020 $z of Alma record will match to 020 $z of the incoming record
  • 020 will NOT match to 776 (any subfield)
  • 776 will NOT match to 776 (any subfield)