A Focused Dynamic Attention Model for Visual Question Answering

Ilija Ilievski; Shuicheng Yan; Jiashi Feng

arXiv:1604.01485·cs.CV·April 7, 2016·130 cites

A Focused Dynamic Attention Model for Visual Question Answering

Ilija Ilievski, Shuicheng Yan, Jiashi Feng

PDF

Open Access

TL;DR

This paper introduces a Focused Dynamic Attention model for VQA that uses object detection and question-driven region features to improve answer accuracy, outperforming existing methods on benchmark datasets.

Contribution

The novel FDA model effectively integrates object region features with global image features based on question keywords, enhancing VQA performance.

Findings

01

FDA outperforms baseline models on VQA dataset

02

Question-driven region focus improves answer accuracy

03

Fusion of region and global features enhances understanding

Abstract

Visual Question and Answering (VQA) problems are attracting increasing interest from multiple research disciplines. Solving VQA problems requires techniques from both computer vision for understanding the visual contents of a presented image or video, as well as the ones from natural language processing for understanding semantics of the question and generating the answers. Regarding visual content modeling, most of existing VQA methods adopt the strategy of extracting global features from the image or video, which inevitably fails in capturing fine-grained information such as spatial configuration of multiple objects. Extracting features from auto-generated regions -- as some region-based image recognition methods do -- cannot essentially address this problem and may introduce some overwhelming irrelevant features with the question. In this work, we propose a novel Focused Dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory