Boosting Weakly-Supervised Temporal Action Localization with Text   Information

Guozhang Li; De Cheng; Xinpeng Ding; Nannan Wang; Xiaoyu Wang; Xinbo; Gao

arXiv:2305.00607·cs.CV·May 2, 2023·1 cites

Boosting Weakly-Supervised Temporal Action Localization with Text Information

Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Xiaoyu Wang, Xinbo, Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach leveraging text descriptions to improve weakly-supervised temporal action localization by enhancing discriminative and generative objectives, leading to state-of-the-art results.

Contribution

The paper proposes a new Text-Segment Mining mechanism and a Video-text Language Completion objective to better utilize text information in WTAL, improving localization accuracy.

Findings

01

Achieved state-of-the-art performance on THUMOS14 and ActivityNet1.3.

02

Method can be seamlessly integrated into existing WTAL models.

03

Significant performance improvements demonstrated across benchmarks.

Abstract

Due to the lack of temporal annotation, current Weakly-supervised Temporal Action Localization (WTAL) methods are generally stuck into over-complete or incomplete localization. In this paper, we aim to leverage the text information to boost WTAL from two aspects, i.e., (a) the discriminative objective to enlarge the inter-class difference, thus reducing the over-complete; (b) the generative objective to enhance the intra-class integrity, thus finding more complete temporal boundaries. For the discriminative objective, we propose a Text-Segment Mining (TSM) mechanism, which constructs a text description based on the action class label, and regards the text as the query to mine all class-related segments. Without the temporal annotation of actions, TSM compares the text query with the entire videos across the dataset to mine the best matching segments while ignoring irrelevant ones. Due…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lgzlilili/boosting-wtal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Natural Language Processing Techniques