Hierarchical Semantic Alignment for Image Clustering

Xingyu Zhu; Beier Zhu; Yunfan Li; Junfeng Fang; Shuo Wang; Kesen Zhao; Hanwang Zhang

arXiv:2512.00904·cs.CV·December 19, 2025

Hierarchical Semantic Alignment for Image Clustering

Xingyu Zhu, Beier Zhu, Yunfan Li, Junfeng Fang, Shuo Wang, Kesen Zhao, Hanwang Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a hierarchical semantic alignment method for image clustering that leverages caption and noun semantics to improve clustering accuracy without training, outperforming existing approaches.

Contribution

The proposed CAE method uniquely combines caption and noun semantics with optimal transport to enhance image clustering performance in a training-free manner.

Findings

01

Achieves 4.2% higher accuracy on ImageNet-1K

02

Surpasses state-of-the-art training-free methods

03

Effective across 8 diverse datasets

Abstract

Image clustering is a classic problem in computer vision, which categorizes images into different groups. Recent studies utilize nouns as external semantic knowledge to improve clustering performance. However, these methods often overlook the inherent ambiguity of nouns, which can distort semantic representations and degrade clustering quality. To address this issue, we propose a hierarChical semAntic alignmEnt method for image clustering, dubbed CAE, which improves clustering performance in a training-free manner. In our approach, we incorporate two complementary types of textual semantics: caption-level descriptions, which convey fine-grained attributes of image content, and noun-level concepts, which represent high-level object categories. We first select relevant nouns from WordNet and descriptions from caption datasets to construct a semantic space aligned with image features.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hierarchical Semantic Alignment for Image Clustering· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques