Record clustering configuration details
Technical overview
We are using Primo's FRBR vector feature to support clustering in search results. We have defined FRBR keys locally to determine which records should cluster. Data elements used to determine which records should cluster are detailed below.
Keys are comprised of 2 parts, numeric identifiers and titles. The keys are created by joining the identifier and title parts using all possible combinations. If one part does not exist, the key is not created. In the clustering process, the keys between records are compared. If a record has a matching key with another record, it is added to the same group. Once a match is found, the system does not continue searching for matches since a record can belong to one group only.
Journals
The key definitions below apply to serial records only (as determined by LDR/07).Â
Numeric identifiers evaluated and at least one numeric identifier must be present in both records:Â
- 022 $a (excluding content after the space if present)
- 022 $l (excluding content after the space if present)
- 035 $a if string starts with OCoLC
- 775 $x (excluding content after the space if present)
- 776 $x (excluding content after the space if present)
- 776 $w only the OCCL number is used
Titles evaluated (titles are normalized for punctuation, capitalization, and removal of initial articles based on 2nd indicator), and at least one matching title must be present in both records:Â
- 245 a,b,n,p
- 222 a
Example of record keys for the print record for JAMA compared to the Alma community zone record for the e-version of JAMA. In this case, 3 of the keys match (only 1 is need to cluster the records together).Â
Record 1 keys (print JAMA) | Record 2 keys (e-JAMA) |
---|---|
jama the journal of the american medical association~0221-7678 | |
jama the journal of the american medical association~0211-4445 | |
jama the journal of the american medical association~(OCoLC)36366429 | jama the journal of the american medical association~(OCoLC)36366429 |
jama the journal of the american medical association~(OCoLC)1124917 | |
jama the journal of the american medical association~1538-3598 | jama the journal of the american medical association~1538-3598 |
jama the journal of the american medical association~0098-7484 | jama the journal of the american medical association~0098-7484 |
jama chicago ill~0098-7484 | |
jama~0098-7484 | |
jama chicago ill~(OCoLC)36366429 | |
jama~(OCoLC)36366429 | |
jama chicago ill~(OCoLC)1124917 | |
jama chicago ill~0211-4445 | |
jama chicago ill~1538-3598 | |
jama~1538-3598 | |
jama chicago ill~0221-7678 |
To see a record's FRBR keys, view the PNX by having the full record open in your browser and adding this to the end of the URL: &snowPnx=true
Other formats
Any formats with ISBNs are also clustered. The matching algorithm is the same, except that ISBN fields are used instead of ISSN fields, i.e.:Â
- 020 $a (excluding content after the space if present)
- 035 $a if string starts with OCoLC
775 $z (excluding content after the space if present)- 776 $z (excluding content after the space if present)
- 776 $w only the OCCL number is used