Adaptive Hierarchical Graph Reasoning with Semantic Coherence for   Video-and-Language Inference

Juncheng Li; Siliang Tang; Linchao Zhu; Haochen Shi; Xuanwen Huang,; Fei Wu; Yi Yang; Yueting Zhuang

arXiv:2107.12270·cs.CV·August 10, 2021

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang,, Fei Wu, Yi Yang, Yueting Zhuang

PDF

Open Access

TL;DR

This paper introduces an adaptive hierarchical graph network with semantic coherence learning to improve video-and-language inference by jointly reasoning over video, subtitles, and complex social interactions.

Contribution

It proposes a novel adaptive hierarchical graph model that adjusts based on semantic structures and incorporates semantic coherence learning for better reasoning.

Findings

01

Significantly outperforms baseline methods.

02

Effectively models long-range relationships and social interactions.

03

Improves alignment between vision and language.

Abstract

Video-and-Language Inference is a recently proposed task for joint video-and-language understanding. This new task requires a model to draw inference on whether a natural language statement entails or contradicts a given video clip. In this paper, we study how to address three critical challenges for this task: judging the global correctness of the statement involved multiple semantic meanings, joint reasoning over video and subtitles, and modeling long-range relationships and complex social interactions. First, we propose an adaptive hierarchical graph network that achieves in-depth understanding of the video over complex interactions. Specifically, it performs joint reasoning over video and subtitles in three hierarchies, where the graph structure is adaptively adjusted according to the semantic structures of the statement. Secondly, we introduce semantic coherence learning to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling