SC 99-42 Tobias Galliat: Clustering Data of Different Information Levels
Abstract: For using Data Mining, especially cluster analysis, one
needs
measures to determine the similarity or distance between
data
objects. In many application fields the data objects can
have
different information levels. In this case the widely
used euclidean
distance is an inappropriate measure. The present paper
describes a
concept how to use data of different information levels
in cluster
analysis and suggests an appropriate similarity measure.
An example
from practice is included, that shows the usefulness of
the concept
and the measure in combination with KOHONENS
Self-Organizing
Map algorithm, a well-known and powerful tool for cluster
analysis.
Keywords: cluster analysis,
Data Mining,
data preprocessing,
information theory,
missing values,
Self-Organizing Maps,
similarity measures
MSC: 62H30, 68T05, 94A17