A Study of Data Clusterability
Speaker: Rita Ackerman
We investigate measures of the clusterability of data sets. Namely,
ways to define how `strong' or `conclusive' is the clustering
structure of a given data set.
Our analysis discovers an interesting phenomenon; Although most of
the common clustering tasks are NP-hard, finding a
close-to-optimal clustering for well clusterable data sets is easy
(computationally). We prove instances of this general claim with
respect to the various clusterability notions that we discuss.
We survey several notions of clusterability that have been discussed
in the literature, as well as propose a new notion of data
clusterability. Our comparison of these notions reveals that, although they all
attempt to evaluate the same intuitive property, they are pairwise
inconsistent.
Finally, we investigate how hard it is to determine the
clusterability value of a given data set. In most cases, it turns
out that this is an NP-hard problem.
Joint work with Shai Ben-David.