Transform-Equivariant Consistency Learning for Temporal Sentence   Grounding

Daizong Liu; Xiaoye Qu; Jianfeng Dong; Pan Zhou; Zichuan Xu; Haozhao; Wang; Xing Di; Weining Lu; Yu Cheng

arXiv:2305.04123·cs.CV·May 9, 2023·2 cites

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Zichuan Xu, Haozhao, Wang, Xing Di, Weining Lu, Yu Cheng

PDF

Open Access

TL;DR

This paper proposes a self-supervised learning framework called ECRL for temporal sentence grounding that enhances discriminative video representations through transform-equivariant consistency, reducing reliance on large paired datasets.

Contribution

Introduction of a novel ECRL framework utilizing self-supervised consistency loss and data augmentation to improve temporal grounding accuracy.

Findings

01

Effective on ActivityNet, TACoS, and Charades-STA datasets.

02

Reduces dependency on large annotated datasets.

03

Improves boundary prediction accuracy.

Abstract

This paper addresses the temporal sentence grounding (TSG). Although existing methods have made decent achievements in this task, they not only severely rely on abundant video-query paired data for training, but also easily fail into the dataset distribution bias. To alleviate these limitations, we introduce a novel Equivariant Consistency Regulation Learning (ECRL) framework to learn more discriminative query-related frame-wise representations for each video, in a self-supervised manner. Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted under various video-level transformations. Concretely, we first design a series of spatio-temporal augmentations on both foreground and background video segments to generate a set of synthetic video samples. In particular, we devise a self-refine module to enhance the completeness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

Methodsfail