Target-Guided Composed Image Retrieval
Haokun Wen, Xian Zhang, Xuemeng Song, Yinwei Wei, Liqiang Nie

TL;DR
This paper introduces TG-CIR, a novel network for composed image retrieval that models conflict relationships and adaptive matching degrees, significantly improving retrieval accuracy on benchmark datasets.
Contribution
The paper proposes a Target-Guided Composed Image Retrieval network with conflict relationship modeling and adaptive matching, enhancing multimodal query composition and ranking accuracy.
Findings
Outperforms existing methods on three benchmark datasets.
Effectively models conflict relationships between reference image and modification text.
Improves ranking accuracy through adaptive matching degree modeling.
Abstract
Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can retrieve the target image for a multimodal query, including a reference image and its corresponding modification text. Although existing efforts have achieved compelling success, they overlook the conflict relationship modeling between the reference image and the modification text for improving the multimodal query composition and the adaptive matching degree modeling for promoting the ranking of the candidate images that could present different levels of matching degrees with the given query. To address these two limitations, in this work, we propose a Target-Guided Composed Image Retrieval network (TG-CIR). In particular, TG-CIR first extracts the unified global and local attribute features for the reference/target image and the modification text with the contrastive language-image pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
