Exploring the Temporal Consistency for Point-Level Weakly-Supervised Temporal Action Localization
Yunchuan Ma, Laiyun Qing, Guorong Li, Yuqing Liu, Yuankai Qi, and Qingming Huang

TL;DR
This paper introduces a multi-task learning framework that leverages point supervision and self-supervised tasks to enhance temporal understanding for more accurate point-level weakly-supervised temporal action localization.
Contribution
It is the first to explicitly explore temporal consistency using self-supervised tasks in point-supervised action localization, improving model understanding of temporal relationships.
Findings
Outperforms state-of-the-art methods on four benchmarks.
Self-supervised tasks improve temporal understanding and localization accuracy.
Demonstrates the importance of modeling temporal relationships in weak supervision.
Abstract
Point-supervised Temporal Action Localization (PTAL) adopts a lightly frame-annotated paradigm (\textit{i.e.}, labeling only a single frame per action instance) to train a model to effectively locate action instances within untrimmed videos. Most existing approaches design the task head of models with only a point-supervised snippet-level classification, without explicit modeling of understanding temporal relationships among frames of an action. However, understanding the temporal relationships of frames is crucial because it can help a model understand how an action is defined and therefore benefits localizing the full frames of an action. To this end, in this paper, we design a multi-task learning framework that fully utilizes point supervision to boost the model's temporal understanding capability for action localization. Specifically, we design three self-supervised temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
