Keep It Light! Simplifying Image Clustering Via Text-Free Adapters
Yicen Li, Haitz S\'aez de Oc\'ariz Borde, Anastasis Kratsios, Paul D. McNicholas

TL;DR
This paper introduces SCP, a simple, text-free clustering method that leverages pre-trained vision models and positive data pairs, achieving competitive results without complex multi-modal pipelines.
Contribution
The paper presents SCP, a lightweight clustering approach that simplifies training by only training a small cluster head and removing the need for text-based embeddings.
Findings
SCP achieves competitive performance on multiple benchmark datasets.
Theoretical analysis shows text embeddings may be unnecessary for effective clustering.
SCP reduces training complexity and resource requirements.
Abstract
In the era of pre-trained models, effective classification can often be achieved using simple linear probing or lightweight readout layers. In contrast, many competitive clustering pipelines have a multi-modal design, leveraging large language models (LLMs) or other text encoders, and text-image pairs, which are often unavailable in real-world downstream applications. Additionally, such frameworks are generally complicated to train and require substantial computational resources, making widespread adoption challenging. In this work, we show that in deep clustering, competitive performance with more complex state-of-the-art methods can be achieved using a text-free and highly simplified training pipeline. In particular, our approach, Simple Clustering via Pre-trained models (SCP), trains only a small cluster head while leveraging pre-trained vision model feature representations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
