SC 99-05 Stefan Lohrum, Wolfram Schneider, Josef Willenborg.: De-duplication in KOBV
Abstract: In KOBV we offer the user an efficient tool for searching
regional and worldwide accessible library catalogues (KOBV search engine). Search
is performed by a distributed Z39.50 retrieval and an index based quicksearch. Due
to the number of catalogues, result sets may contain a significant amount of duplicate
records.
Therefore we integrate a de-duplication procedure into KOBV search engine.
It is part of the distributed search and the KOBV quicksearch as well.
Main goals
are the presentation of uniform retrieval results, the preservation of retrieval
quality and cutting off redundant information. At least we keep an eye on efficiency.
De-duplication is fully parametrizable, so that settings can be changed easily on
line.
Keywords: de-duplication method,
distributed library system,
clustering
MSC: 94-XX
CR: H.3.6., H.3.3