Hierarchical Textual Knowledge for Enhanced Image Clustering

Yijie Zhong; Yunfan Gao; Weipeng Jiang; Haofen Wang

arXiv:2604.11144·cs.CV·April 14, 2026

Hierarchical Textual Knowledge for Enhanced Image Clustering

Yijie Zhong, Yunfan Gao, Weipeng Jiang, Haofen Wang

PDF

TL;DR

This paper introduces a hierarchical textual knowledge approach using large language models to improve image clustering by incorporating rich semantic information beyond visual features.

Contribution

It proposes a knowledge-enhanced clustering method that constructs hierarchical concept-attribute structured knowledge to guide image clustering, outperforming existing methods.

Findings

01

KEC improves clustering accuracy across 20 datasets.

02

KEC without training surpasses zero-shot CLIP on 14 datasets.

03

Naive textual knowledge use can harm performance, but KEC is robust.

Abstract

Image clustering aims to group images in an unsupervised fashion. Traditional methods focus on knowledge from visual space, making it difficult to distinguish between visually similar but semantically different classes. Recent advances in vision-language models enable the use of textual knowledge to enhance image clustering. However, most existing methods rely on coarse class labels or simple nouns, overlooking the rich conceptual and attribute-level semantics embedded in textual space. In this paper, we propose a knowledge-enhanced clustering (KEC) method that constructs a hierarchical concept-attribute structured knowledge with the help of large language models (LLMs) to guide clustering. Specifically, we first condense redundant textual labels into abstract concepts and then automatically extract discriminative attributes for each single concept and similar concept pairs, via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.