Finding Multiple Stable Clusterings
Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stable clusterings (MSC). We introduce to multi-clustering the notion of clustering stability based on Laplacian eigengap, which was originally used by the regularized spectral learning method for similarity matrix learning. We mathematically prove that the larger the eigengap, the more stable the clustering. Furthermore, we propose a novel multi-clustering method MSC. An advantage of our method comparing to the state-of-the-art multi-clustering methods is that our method can provide users a feature subspace to understand each clustering solution. Another advantage is that MSC does not need users to specify the number of clusters and the number of alternative clusterings, which is usually difficult for users without any guidance. Our method can heuristically estimate the number of stable clusterings in a data set. We also discuss a practical way to make MSC applicable to large-scale data. We report an extensive empirical study that clearly demonstrates the effectiveness of our method.
Knowledge and Information Systems
pre print, post print (12 month embargo)
Hu, Juhua; Qian, Qi; Pei, Jian; Jin, Rong; and Zhu, Shenghuo, "Finding Multiple Stable Clusterings" (2017). School of Engineering and Technology Publications. 202.