CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
Can Zhang, Meng Cao, Dongming Yang, Jie Chen, Yuexian Zou

TL;DR
CoLA introduces a snippet contrastive learning approach with hard snippet mining to improve weakly-supervised temporal action localization, leading to more accurate action boundary detection and state-of-the-art results.
Contribution
The paper proposes a novel contrastive learning framework with hard snippet mining for weakly-supervised temporal action localization, enhancing snippet discrimination and boundary precision.
Findings
Achieves state-of-the-art performance on THUMOS'14 and ActivityNet v1.2 datasets.
The SniCo Loss improves feature representation of hard snippets.
Hard snippet mining effectively identifies challenging segments for better localization.
Abstract
Weakly-supervised temporal action localization (WS-TAL) aims to localize actions in untrimmed videos with only video-level labels. Most existing models follow the "localization by classification" procedure: locate temporal regions contributing most to the video-level classification. Generally, they process each snippet (or frame) individually and thus overlook the fruitful temporal context relation. Here arises the single snippet cheating issue: "hard" snippets are too vague to be classified. In this paper, we argue that learning by comparing helps identify these hard snippets and we propose to utilize snippet Contrastive learning to Localize Actions, CoLA for short. Specifically, we propose a Snippet Contrast (SniCo) Loss to refine the hard snippet representation in feature space, which guides the network to perceive precise temporal boundaries and avoid the temporal interval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
MethodsContrastive Learning · COLA
