Video Question Answering via Attribute-Augmented Attention Network   Learning

Yunan Ye; Zhou Zhao; Yimeng Li; Long Chen; Jun Xiao; Yueting Zhuang

arXiv:1707.06355·cs.CV·July 21, 2017

Video Question Answering via Attribute-Augmented Attention Network Learning

Yunan Ye, Zhou Zhao, Yimeng Li, Long Chen, Jun Xiao, Yueting Zhuang

PDF

TL;DR

This paper introduces an attribute-augmented attention network that models temporal dynamics and performs multi-step reasoning for improved video question answering, addressing limitations of static image-based methods.

Contribution

It proposes a novel attribute-augmented attention framework with joint attribute detection and multi-step reasoning for video question answering.

Findings

01

Effective on multiple-choice and open-ended tasks

02

Improves performance over existing methods

03

Constructed a large-scale VQA dataset

Abstract

Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question. However, the existing visual question answering approaches mainly tackle the problem of static image question, which may be ineffectively for video question answering due to the insufficiency of modeling the temporal dynamics of video contents. In this paper, we study the problem of video question answering by modeling its temporal dynamics with frame-level attention mechanism. We propose the attribute-augmented attention network learning framework that enables the joint frame-level attribute detection and unified video representation learning for video question answering. We then incorporate the multi-step reasoning process for our proposed attention network to further improve the performance. We construct a large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.