Variational Cross-Graph Reasoning and Adaptive Structured Semantics   Learning for Compositional Temporal Grounding

Juncheng Li; Siliang Tang; Linchao Zhu; Wenqiao Zhang; Yi Yang,; Tat-Seng Chua; Fei Wu; Yueting Zhuang

arXiv:2301.09071·cs.CV·May 16, 2023·1 cites

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang,, Tat-Seng Chua, Fei Wu, Yueting Zhuang

PDF

Open Access

TL;DR

This paper introduces a new benchmark and a novel variational cross-graph reasoning framework for temporal grounding, emphasizing structured semantics to improve compositional generalization in video-language understanding.

Contribution

It proposes a variational cross-graph reasoning model with adaptive structured semantics learning to enhance compositional generalization in temporal grounding tasks.

Findings

01

State-of-the-art methods perform poorly on new compositional queries.

02

The proposed approach significantly improves generalization to novel word combinations.

03

Structured semantic graphs are crucial for compositional reasoning in videos.

Abstract

Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence. This task has achieved significant momentum in the computer vision community as it enables activity grounding beyond pre-defined activity classes by utilizing the semantic diversity of natural language descriptions. The semantic diversity is rooted in the principle of compositionality in linguistics, where novel semantics can be systematically described by combining known words in novel ways (compositional generalization). However, existing temporal grounding datasets are not carefully designed to evaluate the compositional generalizability. To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i.e., Charades-CG and ActivityNet-CG. When…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques

Methodsfail