Sentence Attention Blocks for Answer Grounding

Seyedalireza Khoshsirat; Chandra Kambhamettu

arXiv:2309.11593·cs.CV·September 22, 2023

Sentence Attention Blocks for Answer Grounding

Seyedalireza Khoshsirat, Chandra Kambhamettu

PDF

Open Access

TL;DR

This paper introduces the Sentence Attention Block, a novel and flexible architectural component that improves answer grounding in visual question answering by explicitly modeling dependencies between image features and sentence embeddings, achieving state-of-the-art results.

Contribution

The paper proposes a new Sentence Attention Block that enhances answer grounding by re-calibrating image features based on sentence context, compatible with pre-trained networks and easy to implement.

Findings

01

Achieved state-of-the-art accuracy on multiple datasets.

02

Demonstrated the effectiveness through ablation studies.

03

Flexible integration with various backbone networks.

Abstract

Answer grounding is the task of locating relevant visual evidence for the Visual Question Answering task. While a wide variety of attention methods have been introduced for this task, they suffer from the following three problems: designs that do not allow the usage of pre-trained networks and do not benefit from large data pre-training, custom designs that are not based on well-grounded previous designs, therefore limiting the learning power of the network, or complicated designs that make it challenging to re-implement or improve them. In this paper, we propose a novel architectural block, which we term Sentence Attention Block, to solve these problems. The proposed block re-calibrates channel-wise image feature-maps by explicitly modeling inter-dependencies between the image feature-maps and sentence embedding. We visually demonstrate how this block filters out irrelevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques