SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval
Surgan Jandial, Pinkesh Badjatiya, Pranit Chawla, Ayush Chopra,, Mausoom Sarkar, Balaji Krishnamurthy

TL;DR
This paper introduces SAC, a novel framework for text-conditioned image retrieval that effectively combines multi-modal inputs and semantic attention to improve retrieval accuracy and flexibility.
Contribution
SAC is a new framework that simplifies and enhances text-conditioned image retrieval by focusing on semantic attention and modification, outperforming existing methods.
Findings
Achieves state-of-the-art performance on FashionIQ, Shoes, and Birds-to-Words datasets.
Supports natural language feedback of varying lengths.
Outperforms existing techniques in quantitative and qualitative evaluations.
Abstract
The ability to efficiently search for images is essential for improving the user experiences across various products. Incorporating user feedback, via multi-modal inputs, to navigate visual search can help tailor retrieved results to specific user queries. We focus on the task of text-conditioned image retrieval that utilizes support text feedback alongside a reference image to retrieve images that concurrently satisfy constraints imposed by both inputs. The task is challenging since it requires learning composite image-text features by incorporating multiple cross-granular semantic edits from text feedback and then applying the same to visual features. To address this, we propose a novel framework SAC which resolves the above in two major steps: "where to see" (Semantic Feature Attention) and "how to change" (Semantic Feature Modification). We systematically show how our architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
