Convex Combination Consistency between Neighbors for Weakly-supervised Action Localization
Qinying Liu, Zilei Wang, Ruoxi Chen, Zhilin Li

TL;DR
This paper introduces C³BN, a novel weakly-supervised action localization method that enhances boundary detection by enforcing consistency between neighboring snippets through convex combination and regularization.
Contribution
The paper proposes a new approach combining data augmentation and consistency regularization to improve weakly-supervised temporal action localization.
Findings
C³BN improves boundary localization accuracy.
The method enhances robustness to subtle snippet variations.
Experimental results outperform baseline methods.
Abstract
Weakly-supervised temporal action localization (WTAL) intends to detect action instances with only weak supervision, e.g., video-level labels. The current~\textit{de facto} pipeline locates action instances by thresholding and grouping continuous high-score regions on temporal class activation sequences. In this route, the capacity of the model to recognize the relationships between adjacent snippets is of vital importance which determines the quality of the action boundaries. However, it is error-prone since the variations between adjacent snippets are typically subtle, and unfortunately this is overlooked in the literature. To tackle the issue, we propose a novel WTAL approach named Convex Combination Consistency between Neighbors (CBN). CBN consists of two key ingredients: a micro data augmentation strategy that increases the diversity in-between adjacent snippets by convex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
