A Better Way to Attend: Attention with Trees for Video Question   Answering

Hongyang Xue; Wenqing Chu; Zhou Zhao; Deng Cai

arXiv:1909.02218·cs.CV·September 15, 2019

A Better Way to Attend: Attention with Trees for Video Question Answering

Hongyang Xue, Wenqing Chu, Zhou Zhao, Deng Cai

PDF

2 Repos

TL;DR

This paper introduces a novel syntax-aware attention model using tree structures for improved video question answering, especially on complex questions, by leveraging sentence parse trees and hierarchical attention mechanisms.

Contribution

The paper proposes the HTreeMN model that incorporates sentence parse trees into attention mechanisms, enhancing understanding of complex questions in video QA tasks.

Findings

01

Outperforms existing attention models on complex questions

02

Effective utilization of sentence parse trees improves accuracy

03

Hierarchical attention distills relevant features efficiently

Abstract

We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention mechanisms regard the question as a whole. They ignore the word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider the semantic structure of the sentences. Although the Extended Soft Attention (E-SA) model for video question answering leverages the word-level attention, it performs poorly on long question sentences. In this paper, we propose the heterogeneous tree-structured memory network (HTreeMN) for video question answering. Our proposed approach is based upon the syntax parse trees of the question sentences. The HTreeMN treats the words differently where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMemory Network