Bridge to Answer: Structure-aware Graph Interaction Network for Video   Question Answering

Jungin Park; Jiyoung Lee; Kwanghoon Sohn

arXiv:2104.14085·cs.CV·April 30, 2021

Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering

Jungin Park, Jiyoung Lee, Kwanghoon Sohn

PDF

TL;DR

This paper introduces a structure-aware graph interaction network for video question answering that leverages question-conditioned visual graphs and bridged visual interactions to improve answer accuracy.

Contribution

It proposes a novel bridge to answer framework with question-conditioned visual graphs and bridged visual interactions for enhanced video question answering.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks.

02

Effectively models appearance and motion cues in videos.

03

Demonstrates the importance of question-conditioned graph interactions.

Abstract

This paper presents a novel method, termed Bridge to Answer, to infer correct answers for questions about a given video by leveraging adequate graph interactions of heterogeneous crossmodal graphs. To realize this, we learn question conditioned visual graphs by exploiting the relation between video and question to enable each visual node using question-to-visual interactions to encompass both visual and linguistic cues. In addition, we propose bridged visual-to-visual interactions to incorporate two complementary visual information on appearance and motion by placing the question graph as an intermediate bridge. This bridged architecture allows reliable message passing through compositional semantics of the question to generate an appropriate answer. As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.