A Sober look at Clustering Stability
Speaker: David Pal
Stability is a common tool to verify the validity of sample
based algorithms. In clustering it is widely used to tune
the parameters of the algorithm, such as the number k of
clusters. In spite of the popularity of stability in
practical applications, there has been very little theoretical analysis of
this notion. In this paper we
provide a formal definition of stability and analyze some of
its basic properties. Quite surprisingly, the conclusion of
our analysis is that for large sample size, stability is
fully determined by the behavior of the objective function
which the clustering algorithm is aiming to minimize. If the
objective function has a unique global minimizer, the
algorithm is stable, otherwise it is unstable. In particular
we conclude that stability is not a well-suited tool to
determine the number of clusters - it is determined by the
symmetries of the data which may be unrelated to
clustering parameters. We prove our results for center-based
clusterings and for spectral clustering, and support our
conclusions by many examples in which the behavior of
stability is counter-intuitive.
Joint work with Shai Ben-David and Ulrike von Luxburg