Compositional Temporal Grounding with Structured Variational Cross-Graph   Correspondence Learning

Juncheng Li; Junlin Xie; Long Qian; Linchao Zhu; Siliang Tang; Fei Wu,; Yi Yang; Yueting Zhuang; Xin Eric Wang

arXiv:2203.13049·cs.CV·March 29, 2022·5 cites

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu,, Yi Yang, Yueting Zhuang, Xin Eric Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new task and datasets for evaluating compositional generalization in temporal video grounding, revealing current models' limitations and proposing a structured variational reasoning framework that improves generalization.

Contribution

The paper presents a novel compositional temporal grounding task, new datasets Charades-CG and ActivityNet-CG, and a structured variational cross-graph reasoning model that enhances compositional generalization.

Findings

01

State-of-the-art methods fail on compositional generalization tasks.

02

Proposed model outperforms baselines in compositional generalization.

03

New datasets enable systematic evaluation of compositional generalization.

Abstract

Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence. Thanks to the semantic diversity of natural language descriptions, temporal grounding allows activity grounding beyond pre-defined classes and has received increasing attention in recent years. The semantic diversity is rooted in the principle of compositionality in linguistics, where novel semantics can be systematically described by combining known words in novel ways (compositional generalization). However, current temporal grounding datasets do not specifically test for the compositional generalizability. To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i.e., Charades-CG and ActivityNet-CG. Evaluating the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yyjmjc/compositional-temporal-grounding
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Human Pose and Action Recognition