Dying Clusters Is All You Need -- Deep Clustering With an Unknown Number of Clusters
Collin Leiber, Niklas Strau{\ss}, Matthias Schubert, Thomas Seidl

TL;DR
This paper introduces UNSEEN, a versatile deep clustering framework that estimates the number of clusters from an upper bound, improving adaptability across various algorithms and datasets without prior knowledge of cluster count.
Contribution
UNSEEN is the first general framework capable of estimating the number of clusters in deep clustering, compatible with multiple algorithms and independent of initial embedding quality.
Findings
UNSEEN effectively estimates cluster numbers across diverse datasets.
Combining UNSEEN with existing algorithms improves clustering accuracy.
Extensive experiments validate the robustness and versatility of the approach.
Abstract
Finding meaningful groups, i.e., clusters, in high-dimensional data such as images or texts without labeled data at hand is an important challenge in data mining. In recent years, deep clustering methods have achieved remarkable results in these tasks. However, most of these methods require the user to specify the number of clusters in advance. This is a major limitation since the number of clusters is typically unknown if labeled data is unavailable. Thus, an area of research has emerged that addresses this problem. Most of these approaches estimate the number of clusters separated from the clustering process. This results in a strong dependency of the clustering result on the quality of the initial embedding. Other approaches are tailored to specific clustering processes, making them hard to adapt to other scenarios. In this paper, we propose UNSEEN, a general framework that, starting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Artificial Intelligence in Healthcare
