Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, and Gang Hua

TL;DR
This paper introduces a weakly-guided self-supervised pretraining approach for temporal activity detection that leverages classification data to generate pseudo labels, improving detection performance without requiring additional annotations.
Contribution
The authors propose a novel self-supervised pretraining method that uses weak labels to create a detection-oriented pretext task, bridging the gap between classification pretraining and detection fine-tuning.
Findings
Outperforms previous methods on Charades and MultiTHUMOS benchmarks.
Effectively generates frame-level pseudo labels from classification data.
Provides insights on optimal usage of the proposed models for activity detection.
Abstract
Temporal Activity Detection aims to predict activity classes per frame, in contrast to video-level predictions in Activity Classification (i.e., Activity Recognition). Due to the expensive frame-level annotations required for detection, the scale of detection datasets is limited. Thus, commonly, previous work on temporal activity detection resorts to fine-tuning a classification model pretrained on large-scale classification datasets (e.g., Kinetics-400). However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks. In this work, we propose a novel 'weakly-guided self-supervised' pretraining method for detection. We leverage weak labels (classification) to introduce a self-supervised pretext task (detection) by generating frame-level pseudo labels, multi-action frames, and action segments.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems
