Scaling Up Deep Clustering Methods Beyond ImageNet-1K

Nikolas Adaloglou; Felix Michels; Kaspar Senft; Diana; Petrusheva; Markus Kollmann

arXiv:2406.01203·cs.CV·June 4, 2024

Scaling Up Deep Clustering Methods Beyond ImageNet-1K

Nikolas Adaloglou, Felix Michels, Kaspar Senft, Diana, Petrusheva, Markus Kollmann

PDF

Open Access

TL;DR

This paper evaluates deep clustering methods on large-scale datasets like ImageNet21K, revealing their advantages over traditional k-means, especially in complex, imbalanced, and fine-grained classification scenarios.

Contribution

It introduces new large-scale benchmarks for deep clustering and provides a comprehensive analysis of factors affecting their performance.

Findings

01

Deep clustering outperforms k-means on most large-scale benchmarks.

02

k-means performs poorly on easy-to-classify datasets.

03

Non-primary clusters often capture meaningful coarse classes.

Abstract

Deep image clustering methods are typically evaluated on small-scale balanced classification datasets while feature-based $k$ -means has been applied on proprietary billion-scale datasets. In this work, we explore the performance of feature-based deep clustering approaches on large-scale benchmarks whilst disentangling the impact of the following data-related factors: i) class imbalance, ii) class granularity, iii) easy-to-recognize classes, and iv) the ability to capture multiple classes. Consequently, we develop multiple new benchmarks based on ImageNet21K. Our experimental analysis reveals that feature-based $k$ -means is often unfairly evaluated on balanced datasets. However, deep clustering methods outperform $k$ -means across most large-scale benchmarks. Interestingly, $k$ -means underperforms on easy-to-classify benchmarks by large margins. The performance gap, however, diminishes on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI