Image Search with Text Feedback by Additive Attention Compositional   Learning

Yuxin Tian; Shawn Newsam; Kofi Boakye

arXiv:2203.03809·cs.CV·March 9, 2022·5 cites

Image Search with Text Feedback by Additive Attention Compositional Learning

Yuxin Tian, Shawn Newsam, Kofi Boakye

PDF

Open Access

TL;DR

This paper introduces Additive Attention Compositional Learning (AACL), a transformer-based method for image retrieval with text feedback, achieving state-of-the-art results on multiple large-scale datasets.

Contribution

The paper proposes a novel additive attention-based composition module for multi-modal image-text retrieval and introduces a new benchmark derived from Shopping100k.

Findings

01

AACL outperforms existing methods on FashionIQ, Fashion200k, and Shopping100k datasets.

02

The additive attention composition module effectively models image-text interactions.

03

Extensive experiments validate the superiority of AACL over strong baselines.

Abstract

Effective image retrieval with text feedback stands to impact a range of real-world applications, such as e-commerce. Given a source image and text feedback that describes the desired modifications to that image, the goal is to retrieve the target images that resemble the source yet satisfy the given modifications by composing a multi-modal (image-text) query. We propose a novel solution to this problem, Additive Attention Compositional Learning (AACL), that uses a multi-modal transformer-based architecture and effectively models the image-text contexts. Specifically, we propose a novel image-text composition module based on additive attention that can be seamlessly plugged into deep neural networks. We also introduce a new challenging benchmark derived from the Shopping100k dataset. AACL is evaluated on three large-scale datasets (FashionIQ, Fashion200k, and Shopping100k), each with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsTanh Activation