INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

Zhiwei Chen; Yupeng Hu; Zhiheng Fu; Zixu Li; Jiale Huang; Qinlei Huang; Yinwei Wei

arXiv:2604.18051·cs.CV·April 21, 2026

INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval

Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, Yinwei Wei

PDF

1 Video

TL;DR

This paper introduces INTENT, a novel noise mitigation framework for composed image retrieval that addresses both cross-modal and modality-inherent noise, improving robustness in real-world noisy datasets.

Contribution

The paper proposes a dual-component model, INTENT, combining visual invariance via FFT and discriminative learning to handle different noise types in CIR datasets.

Findings

01

INTENT outperforms existing methods on benchmark datasets.

02

The approach effectively reduces the impact of annotation errors.

03

Experimental results show improved robustness and accuracy.

Abstract

Composed Image Retrieval (CIR) is a challenging image retrieval paradigm that enables to retrieve target images based on multimodal queries consisting of reference images and modification texts. Although substantial progress has been made in recent years, existing methods assume that all samples are correctly matched. However, in real-world scenarios, due to high triplet annotation costs, CIR datasets inevitably contain annotation errors, resulting in incorrectly matched triplets. To address this issue, the problem of Noisy Triplet Correspondence (NTC) has attracted growing attention. We argue that noise in CIR can be categorized into two types: cross-modal correspondence noise and modality-inherent noise. The former arises from mismatches across modalities, whereas the latter originates from intra-modal background interference or visual factors irrelevant to the coarse-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

INTENT: Invariance and Discrimination-aware Noise Mitigation for Robust Composed Image Retrieval· underline