Answer Questions with Right Image Regions: A Visual Attention   Regularization Approach

Yibing Liu; Yangyang Guo; Jianhua Yin; Xuemeng Song; Weifeng Liu,; Liqiang Nie

arXiv:2102.01916·cs.CV·November 9, 2021·5 cites

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

Yibing Liu, Yangyang Guo, Jianhua Yin, Xuemeng Song, Weifeng Liu,, Liqiang Nie

PDF

Open Access 1 Repo

TL;DR

This paper introduces AttReg, a flexible visual attention regularization method for VQA that improves visual grounding without requiring human attention data, leading to state-of-the-art results.

Contribution

The novel AttReg approach enhances visual attention in VQA models by focusing on ignored key regions without human supervision, improving accuracy across multiple datasets.

Findings

01

Achieved 60.00% accuracy on VQA-CP v2, a new state-of-the-art.

02

AttReg improves visual grounding and reasoning in VQA models.

03

Effective across three benchmark datasets.

Abstract

Visual attention in Visual Question Answering (VQA) targets at locating the right image regions regarding the answer prediction, offering a powerful technique to promote multi-modal understanding. However, recent studies have pointed out that the highlighted image regions from the visual attention are often irrelevant to the given question and answer, leading to model confusion for correct visual reasoning. To tackle this problem, existing methods mostly resort to aligning the visual attention weights with human attentions. Nevertheless, gathering such human data is laborious and expensive, making it burdensome to adapt well-developed models across datasets. To address this issue, in this paper, we devise a novel visual attention regularization approach, namely AttReg, for better visual grounding in VQA. Specifically, AttReg firstly identifies the image regions which are essential for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BierOne/VQA-AttReg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning