Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition
Yu Wang, Sanping Zhou, Kun Xia, Le Wang

TL;DR
This paper introduces a novel semi-supervised action recognition framework that enhances spatio-temporal discrimination using adaptive contrastive learning and multi-scale temporal modeling, significantly improving accuracy on benchmark datasets.
Contribution
It proposes two new techniques, ACL and MTL, integrated into a unified framework to better distinguish actions with limited labeled data.
Findings
Outperforms state-of-the-art methods on UCF101, HMDB51, Kinetics400
Effectively leverages unlabeled data for improved recognition
Enhances discriminative spatio-temporal feature learning
Abstract
Semi-supervised action recognition aims to improve spatio-temporal reasoning ability with a few labeled data in conjunction with a large amount of unlabeled data. Albeit recent advancements, existing powerful methods are still prone to making ambiguous predictions under scarce labeled data, embodied as the limitation of distinguishing different actions with similar spatio-temporal information. In this paper, we approach this problem by empowering the model two aspects of capability, namely discriminative spatial modeling and temporal structure modeling for learning discriminative spatio-temporal representations. Specifically, we propose an Adaptive Contrastive Learning~(ACL) strategy. It assesses the confidence of all unlabeled samples by the class prototypes of the labeled data, and adaptively selects positive-negative samples from a pseudo-labeled sample bank to construct contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsContrastive Language-Image Pre-training
