Header menu link for other important links
X
What are clusters in high dimensions and are they difficult to find?
F. Klawonn, F. Höppner,
Published in Springer Verlag
2015
Volume: 7627
   
Pages: 14 - 33
Abstract
The distribution of distances between points in a highdimensional data set tends to look quite different from the distribution of the distances in a low-dimensional data set. Concentration of norm is one of the phenomena from which high-dimensional data sets can suffer. It means that in high dimensions – under certain general assumptions – the relative distances from any point to its closest and farthest neighbour tend to be almost identical. Since cluster analysis is usually based on distances, such effects must be taken into account and their influence on cluster analysis needs to be considered. This paper investigates consequences that the special properties of high-dimensional data have for cluster analysis. We discuss questions like when clustering in high dimensions is meaningful at all, can the clusters just be artifacts and what are the algorithmic problems for clustering methods in high dimensions. © Springer-Verlag Berlin Heidelberg 2015.