HINT: Composed Image Retrieval with Dual-path Compositional Contextualized Network

Mingyu Zhang; Zixu Li; Zhiwei Chen; Zhiheng Fu; Xiaowei Zhu; Jiajia Nie; Yinwei Wei; Yupeng Hu

arXiv:2603.26341·cs.CV·March 30, 2026

HINT: Composed Image Retrieval with Dual-path Compositional Contextualized Network

Mingyu Zhang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Xiaowei Zhu, Jiajia Nie, Yinwei Wei, Yupeng Hu

PDF

1 Repo 1 Models

TL;DR

This paper introduces HINT, a dual-path network that enhances composed image retrieval by incorporating contextual information and amplifying similarity differences, leading to superior performance on benchmark datasets.

Contribution

The paper proposes a novel dual-path compositional contextualized network (HINT) that effectively encodes context and amplifies similarity differences for improved CIR.

Findings

01

HINT achieves state-of-the-art results on two CIR benchmarks.

02

HINT effectively encodes contextual information to distinguish matching samples.

03

HINT improves performance in complex CIR scenarios.

Abstract

Composed Image Retrieval (CIR) is a challenging image retrieval paradigm. It aims to retrieve target images from large-scale image databases that are consistent with the modification semantics, based on a multimodal query composed of a reference image and modification text. Although existing methods have made significant progress in cross-modal alignment and feature fusion, a key flaw remains: the neglect of contextual information in discriminating matching samples. However, addressing this limitation is not an easy task due to two challenges: 1) implicit dependencies and 2) the lack of a differential amplification mechanism. To address these challenges, we propose a dual-patH composItional coNtextualized neTwork (HINT), which can perform contextualized encoding and amplify the similarity differences between matching and non-matching samples, thus improving the upper performance of CIR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zh-mingyu/HINT
github

Models

🤗
iLearn-Lab/HINT
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.