TL;DR
This paper introduces GSEC, a novel image clustering framework that leverages generative semantic guidance and bi-layer ensemble learning to effectively reduce bias and variance, outperforming existing methods on multiple benchmarks.
Contribution
The paper proposes GSEC, combining multimodal large language models and a bi-layer ensemble to improve image clustering by reducing bias and variance simultaneously.
Findings
GSEC outperforms 18 state-of-the-art methods on six benchmark datasets.
The method effectively reduces both bias and variance in clustering.
Experimental results validate the robustness and superiority of GSEC.
Abstract
Image clustering aims to partition unlabeled image datasets into distinct groups. A core aspect of this task is constructing and leveraging prior knowledge to guide the clustering process. Recent approaches introduce semantic descriptions as prior information, most of which typically relying on matching-based techniques with predefined vocabularies. However, the limited matching space restricts their adaptability to downstream clustering tasks. Moreover, these methods primarily focus on reducing bias to improve performance, frequently overlooking the importance of variance reduction. To address these limitations, we propose GSEC (Image Clustering based on Generative Semantic Guidance and Bi-Layer Ensemble), a framework designed to reduce bias through generative semantic guidance and mitigate variance via ensemble learning. Our method employs Multimodal Large Language Models to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
