Clustering the Google Distance Using Graph Cuts

Speaker: Thomas Zeugmann, Hokkaido University, Japan

Clustering algorithms working with a matrix of pairwise similarities (kernel matrix) for the data are widely known and used, a particularly popular class being spectral clustering algorithms. In contrast, algorithms working with the pairwise distance matrix have been studied rarely for clustering. This is surprising, as in many applications, distances are directly given, and computing similarities involves another step that is error-prone, since the kernel has to be chosen appropriately, albeit computationally cheap.

In the talk we propose a clustering algorithm based on the SDP relaxation of the max-k-cut of the graph of pairwise distances, based on the work of Frieze and Jerrum. Experimental results are presented indicating that the more direct approach yields better results.

Finally, we shortly propose a simple heuristic for dealing with missing data, i.e., the case where some of the pairwise distances or similarities are not known. --