Decompose Semantic Shifts for Composed Image Retrieval
Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing, Zhang

TL;DR
This paper introduces a Semantic Shift network (SSN) for composed image retrieval that explicitly decomposes user instructions into semantic shifts, significantly improving retrieval accuracy by modeling the transition from reference to target images.
Contribution
The paper proposes a novel SSN model that explicitly decomposes instructions into degradation and upgradation steps, addressing limitations of previous methods that oversimplified textual instructions.
Findings
SSN achieves a 5.42% improvement on CIRR dataset.
SSN achieves a 1.37% improvement on FashionIQ dataset.
SSN establishes a new state-of-the-art performance.
Abstract
Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image. However, most existing methods focus on the composition learning of text and reference images and oversimplify the text as a description, neglecting the inherent structure and the user's shifting intention of the texts. As a result, these methods typically take shortcuts that disregard the visual cue of the reference images. To address this issue, we reconsider the text as instructions and propose a Semantic Shift network (SSN) that explicitly decomposes the semantic shifts into two steps: from the reference image to the visual prototype and from the visual prototype to the target image. Specifically, SSN explicitly decomposes the instructions into two components: degradation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsFocus
