Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

TL;DR
This paper introduces Hierarchical Conditional Relation Networks (HCRN) for multimodal video question answering, effectively modeling complex spatio-temporal and multimodal relations to improve performance on benchmark datasets.
Contribution
It proposes a novel neural unit called Conditional Relation Network (CRN) and a hierarchical architecture (HCRN) for better content selection and relation modeling in Video QA.
Findings
Achieved consistent improvements over state-of-the-art methods on TGIF-QA and TVQA datasets.
Demonstrated the effectiveness of CRN units in flexible multimodal relation encoding.
Validated the hierarchical approach for capturing video content and associated information.
Abstract
Video QA challenges modelers in multiple fronts. Modeling video necessitates building not only spatio-temporal models for the dynamic visual channel but also multimodal structures for associated information channels such as subtitles or audio. Video QA adds at least two more layers of complexity - selecting relevant content for each channel in the context of the linguistic query, and composing spatio-temporal concepts and relations in response to the query. To address these requirements, we start with two insights: (a) content selection and relation construction can be jointly encapsulated into a conditional computational structure, and (b) video-length structures can be composed hierarchically. For (a) this paper introduces a general-reusable neural unit dubbed Conditional Relation Network (CRN) taking as input a set of tensorial objects and translating into a new set of objects that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConditional Relation Network
