Scaling Up Deep Clustering Methods Beyond ImageNet-1K
Nikolas Adaloglou, Felix Michels, Kaspar Senft, Diana, Petrusheva, Markus Kollmann

TL;DR
This paper evaluates deep clustering methods on large-scale datasets like ImageNet21K, revealing their advantages over traditional k-means, especially in complex, imbalanced, and fine-grained classification scenarios.
Contribution
It introduces new large-scale benchmarks for deep clustering and provides a comprehensive analysis of factors affecting their performance.
Findings
Deep clustering outperforms k-means on most large-scale benchmarks.
k-means performs poorly on easy-to-classify datasets.
Non-primary clusters often capture meaningful coarse classes.
Abstract
Deep image clustering methods are typically evaluated on small-scale balanced classification datasets while feature-based -means has been applied on proprietary billion-scale datasets. In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks whilst disentangling the impact of the following data-related factors: i) class imbalance, ii) class granularity, iii) easy-to-recognize classes, and iv) the ability to capture multiple classes. Consequently, we develop multiple new benchmarks based on ImageNet21K. Our experimental analysis reveals that feature-based -means is often unfairly evaluated on balanced datasets. However, deep clustering methods outperform -means across most large-scale benchmarks. Interestingly, -means underperforms on easy-to-classify benchmarks by large margins. The performance gap, however, diminishes on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI
