SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval

Surgan Jandial; Pinkesh Badjatiya; Pranit Chawla; Ayush Chopra,; Mausoom Sarkar; Balaji Krishnamurthy

arXiv:2009.01485·cs.CV·October 22, 2021·1 cites

SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval

Surgan Jandial, Pinkesh Badjatiya, Pranit Chawla, Ayush Chopra,, Mausoom Sarkar, Balaji Krishnamurthy

PDF

Open Access 1 Video

TL;DR

This paper introduces SAC, a novel framework for text-conditioned image retrieval that effectively combines multi-modal inputs and semantic attention to improve retrieval accuracy and flexibility.

Contribution

SAC is a new framework that simplifies and enhances text-conditioned image retrieval by focusing on semantic attention and modification, outperforming existing methods.

Findings

01

Achieves state-of-the-art performance on FashionIQ, Shoes, and Birds-to-Words datasets.

02

Supports natural language feedback of varying lengths.

03

Outperforms existing techniques in quantitative and qualitative evaluations.

Abstract

The ability to efficiently search for images is essential for improving the user experiences across various products. Incorporating user feedback, via multi-modal inputs, to navigate visual search can help tailor retrieved results to specific user queries. We focus on the task of text-conditioned image retrieval that utilizes support text feedback alongside a reference image to retrieve images that concurrently satisfy constraints imposed by both inputs. The task is challenging since it requires learning composite image-text features by incorporating multiple cross-granular semantic edits from text feedback and then applying the same to visual features. To address this, we propose a novel framework SAC which resolves the above in two major steps: "where to see" (Semantic Feature Attention) and "how to change" (Semantic Feature Modification). We systematically show how our architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques