Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gaopeng Gou, Gang, Xiong, Qi Wu

TL;DR
Denoise-I2W introduces a denoising image-to-word mapping method that improves zero-shot composed image retrieval by accurately capturing manipulation intentions, leading to state-of-the-art results across multiple datasets.
Contribution
The paper proposes a novel denoising image-to-word mapping approach that enhances zero-shot composed image retrieval without requiring additional annotations.
Findings
Significant performance improvements (1.45% to 4.17%) over existing methods.
Strong generalization across multiple ZS-CIR models and datasets.
Achieves new state-of-the-art results on benchmark datasets.
Abstract
Zero-Shot Composed Image Retrieval (ZS-CIR) supports diverse tasks with a broad range of visual content manipulation intentions that can be related to domain, scene, object, and attribute. A key challenge for ZS-CIR is to accurately map image representation to a pseudo-word token that captures the manipulation intention relevant image information for generalized CIR. However, existing methods between the retrieval and pre-training stages lead to significant redundancy in the pseudo-word tokens. In this paper, we propose a novel denoising image-to-word mapping approach, named Denoise-I2W, for mapping images into denoising pseudo-word tokens that, without intention-irrelevant visual information, enhance accurate ZS-CIR. Specifically, a pseudo triplet construction module first automatically constructs pseudo triples (\textit{i.e.,} a pseudo-reference image, a pseudo-manipulation text, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction
