ZIB PaperWeb

De-duplication in KOBV


SC 99-05 Stefan Lohrum, Wolfram Schneider, Josef Willenborg.: De-duplication in KOBV


Abstract: In KOBV we offer the user an efficient tool for searching regional and worldwide accessible library catalogues (KOBV search engine). Search is performed by a distributed Z39.50 retrieval and an index based quicksearch. Due to the number of catalogues, result sets may contain a significant amount of duplicate records.
Therefore we integrate a de-duplication procedure into KOBV search engine. It is part of the distributed search and the KOBV quicksearch as well.
Main goals are the presentation of uniform retrieval results, the preservation of retrieval quality and cutting off redundant information. At least we keep an eye on efficiency. De-duplication is fully parametrizable, so that settings can be changed easily on line.
Keywords: de-duplication method, distributed library system, clustering
MSC: 94-XX
CR: H.3.6., H.3.3