A Sketch+Text Composed Image Retrieval Dataset for Thangka

Jinyu Xu; Yi Sun; Jiangling Zhang; Qing Xie; Daomin Ji; Zhifeng Bao; Jiachen Li; Yanchun Ma; Yongjian Liu

arXiv:2602.08411·cs.IR·April 21, 2026

A Sketch+Text Composed Image Retrieval Dataset for Thangka

Jinyu Xu, Yi Sun, Jiangling Zhang, Qing Xie, Daomin Ji, Zhifeng Bao, Jiachen Li, Yanchun Ma, Yongjian Liu

PDF

1 Repo

TL;DR

CIRThan is a new dataset for sketch+text composed image retrieval in Thangka art, highlighting challenges in aligning multimodal inputs with complex, domain-specific imagery.

Contribution

The paper introduces CIRThan, a culturally grounded dataset with hierarchical descriptions for Thangka images, and evaluates existing methods, exposing their limitations in this domain.

Findings

01

Existing CIR methods struggle with fine-grained, domain-specific Thangka images.

02

Hierarchical textual descriptions improve semantic understanding in retrieval.

03

Zero-shot methods perform poorly without in-domain supervision.

Abstract

Composed Image Retrieval (CIR) enables image retrieval by combining multiple query modalities, but existing benchmarks predominantly focus on general-domain imagery and rely on reference images with short textual modifications. As a result, they provide limited support for retrieval scenarios that require fine-grained semantic reasoning, structured visual understanding, and domain-specific knowledge. In this work, we introduce CIRThan, a sketch+text Composed Image Retrieval dataset for Thangka imagery, a culturally grounded and knowledge-specific visual domain characterized by complex structures, dense symbolic elements, and domain-dependent semantic conventions. CIRThan contains 2,287 high-quality Thangka images, each paired with a human-drawn sketch and hierarchical textual descriptions at three semantic levels, enabling composed queries that jointly express structural intent and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinyuxu-whut/CIRThan
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.