Relation-aware Hierarchical Attention Framework for Video Question   Answering

Fangtao Li; Ting Bai; Chenyu Cao; Zihe Liu; Chenghao Yan; Bin Wu

arXiv:2105.06160·cs.CV·May 17, 2021

Relation-aware Hierarchical Attention Framework for Video Question Answering

Fangtao Li, Ting Bai, Chenyu Cao, Zihe Liu, Chenghao Yan, Bin Wu

PDF

1 Repo

TL;DR

This paper introduces a Relation-aware Hierarchical Attention framework for VideoQA that models static and dynamic object relations in videos, improving understanding and answering accuracy.

Contribution

The novel RHA framework effectively captures both static and dynamic relations among objects, enhancing VideoQA performance over existing methods.

Findings

01

RHA outperforms state-of-the-art methods on large-scale VideoQA dataset.

02

Hierarchical attention effectively fuses multimodal features.

03

Dynamic relation modeling improves understanding of video content.

Abstract

Video Question Answering (VideoQA) is a challenging video understanding task since it requires a deep understanding of both question and video. Previous studies mainly focus on extracting sophisticated visual and language embeddings, fusing them by delicate hand-crafted networks. However, the relevance of different frames, objects, and modalities to the question are varied along with the time, which is ignored in most of existing methods. Lacking understanding of the the dynamic relationships and interactions among objects brings a great challenge to VideoQA task. To address this problem, we propose a novel Relation-aware Hierarchical Attention (RHA) framework to learn both the static and dynamic relations of the objects in videos. In particular, videos and questions are embedded by pre-trained models firstly to obtain the visual and textual features. Then a graph-based relation encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lee-Ft/RHA
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.