Stable Mean Teacher for Semi-supervised Video Action Detection
Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat

TL;DR
This paper introduces Stable Mean Teacher, a semi-supervised learning framework for video action detection that enhances pseudo label quality and temporal consistency, significantly improving performance on multiple benchmarks.
Contribution
It proposes a novel Error Recovery module and Difference of Pixels constraint to improve pseudo labels and temporal coherence in semi-supervised video action detection.
Findings
Outperforms supervised baselines by large margins on multiple benchmarks.
Achieves competitive results with only 10-20% labeled data.
Demonstrates generalization to other video tasks like segmentation.
Abstract
In this work, we focus on semi-supervised learning for video action detection. Video action detection requires spatiotemporal localization in addition to classification, and a limited amount of labels makes the model prone to unreliable predictions. We present Stable Mean Teacher, a simple end-to-end teacher-based framework that benefits from improved and temporally consistent pseudo labels. It relies on a novel Error Recovery (EoR) module, which learns from students' mistakes on labeled samples and transfers this knowledge to the teacher to improve pseudo labels for unlabeled samples. Moreover, existing spatiotemporal losses do not take temporal coherency into account and are prone to temporal inconsistencies. To address this, we present Difference of Pixels (DoP), a simple and novel constraint focused on temporal consistency, leading to coherent temporal detections. We evaluate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsFocus
