Alma Matching Algorithms
Overview
For general duplicate detection (e.g. manually finding matches in the institutional repository by using MDE Find Matches), we are using Alma's Title Statement Extended Fuzzy Matching, defined below. The Ex Libris documentation for all of the match methods is available here.
Note that:
- For exports from Connexion, matching is based solely on OCLC number in 035. This was a Harvard configuration decision.Â
- For data loads (Alma import profiles), each profile is defined separately and should uses the Alma matching algorithm most appropriate for that data load.Â
Title Statement Extended Fuzzy
Sequence | Criteria | Year (008 Date 1) | Format LDR/06 | Next step |
---|---|---|---|---|
First attempt at match | Match on one or more of:
| Must be within 1 year | Must match | If no match is found after the first attempt, system proceeds to next step Note: if each record has an identifier and they do not match, or one lacks an identifier, process continues to next step. |
Second | Author (1) + title (2) | Must be within 1 year | Must match | If record has no author, skip to next step |
Third | Title (2) | Must be within 1 year | Must match | None |
Â
(1) Author criteria:Â 100 a-d,jq,u; 110 a-e,n,u; 111 a,c-e,n,q,u; 700 a-d,jq,u; 710 a-e,I,n,u; 711 a,c-e,I,j,n,q,u
(2) Title criteria: 245 a,b,k,n,p (including 880/245)
For titles, there will be no "match within." For example, these two titles will not match:
245 $a Paradiso / $c ...
245 $a Paradiso $b : I-XVII : edizone critica ....
because the entirety of 245 abknp for each do not match exactly.
Extended Fuzzy
Same functionality as Title Statement Extended Fuzzy, except that title matching includes (+ 880):Â
- 245 a
- 245 a,b,k,n,p
- 210 a
- 246 a
ISBN match method (exact subfield match)
The Ex Libris documentation for this match method is not terribly clear. This is how it works:Â
- 020 $a of Alma record will match to 020 $a of the incoming record
- 020 $z of Alma record will match to 020 $z of the incoming record
- 020 will NOT match to 776 (any subfield)
- 776 will NOT match to 776 (any subfield)