CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering
Ruilin Zhang, Haiyang Zheng, Hongpeng Wang

TL;DR
CNMBI introduces a novel, data-distribution-based method for determining the optimal number of clusters without relying on traditional validation indices, suitable for complex, high-dimensional data.
Contribution
The paper presents CNMBI, a new approach that uses center pairwise matching and boundary filtering, incorporating confidence levels to improve cluster number estimation.
Findings
CNMBI outperforms state-of-the-art methods on challenging datasets.
It is robust across different data dimensions and shapes.
Active removal of low-confidence samples enhances accuracy.
Abstract
One of the main challenges in data mining is choosing the optimal number of clusters without prior information. Notably, existing methods are usually in the philosophy of cluster validation and hence have underlying assumptions on data distribution, which prevents their application to complex data such as large-scale images and high-dimensional data from the real world. In this regard, we propose an approach named CNMBI. Leveraging the distribution information inherent in the data space, we map the target task as a dynamic comparison process between cluster centers regarding positional behavior, without relying on the complete clustering results and designing the complex validity index as before. Bipartite graph theory is then employed to efficiently model this process. Additionally, we find that different samples have different confidence levels and thereby actively remove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
