ZIB PaperWeb

Clustering Data of Different Information Levels

SC 99-42 Tobias Galliat: Clustering Data of Different Information Levels

Abstract: For using Data Mining, especially cluster analysis, one needs measures to determine the similarity or distance between data objects. In many application fields the data objects can have different information levels. In this case the widely used euclidean distance is an inappropriate measure. The present paper describes a concept how to use data of different information levels in cluster analysis and suggests an appropriate similarity measure. An example from practice is included, that shows the usefulness of the concept and the measure in combination with KOHONENS Self-Organizing Map algorithm, a well-known and powerful tool for cluster analysis.
Keywords: cluster analysis, Data Mining, data preprocessing, information theory, missing values, Self-Organizing Maps, similarity measures
MSC: 62H30, 68T05, 94A17