TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

Zixu Li; Yupeng Hu; Zhiheng Fu; Zhiwei Chen; Yongqi Li; Liqiang Nie

arXiv:2604.21806·cs.CV·April 27, 2026

TEMA: Anchor the Image, Follow the Text for Multi-Modification Composed Image Retrieval

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, Liqiang Nie

PDF

1 Repo 1 Models

TL;DR

TEMA introduces a novel framework and datasets for multi-modification composed image retrieval, addressing limitations of previous methods and improving accuracy and efficiency in practical scenarios.

Contribution

The paper presents TEMA, the first CIR framework designed for multi-modification, along with new instruction-rich datasets, enhancing real-world applicability.

Findings

01

TEMA outperforms existing methods on four benchmark datasets.

02

The datasets enable more comprehensive evaluation of CIR models.

03

TEMA balances retrieval accuracy with computational efficiency.

Abstract

Composed Image Retrieval (CIR) is an important image retrieval paradigm that enables users to retrieve a target image using a multimodal query that consists of a reference image and modification text. Although research on CIR has made significant progress, prevailing setups still rely simple modification texts that typically cover only a limited range of salient changes, which induces two limitations highly relevant to practical applications, namely Insufficient Entity Coverage and Clause-Entity Misalignment. In order to address these issues and bring CIR closer to real-world use, we construct two instruction-rich multi-modification datasets, M-FashionIQ and M-CIRR. In addition, we propose TEMA, the Text-oriented Entity Mapping Architecture, which is the first CIR framework designed for multi-modification while also accommodating simple modifications. Extensive experiments on four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lee-zixu/ACL26-TEMA
github

Models

🤗
iLearn-Lab/ACL26-TEMA
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.