SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition
Cong Wu, Xiao-Jun Wu, Josef Kittler, Tianyang Xu, Sara Atito, Muhammad, Awais, Zhenhua Feng

TL;DR
SCD-Net introduces a novel contrastive learning framework that disentangles spatiotemporal clues in skeleton sequences, improving various action recognition tasks with significant performance gains.
Contribution
The paper proposes a new contrastive learning framework with explicit spatiotemporal clue disentanglement and a masking strategy, advancing skeleton-based action recognition.
Findings
Outperforms state-of-the-art methods on multiple datasets
Effective in action recognition, retrieval, and transfer learning
Enhances contextual understanding through structural masking
Abstract
Contrastive learning has achieved great success in skeleton-based action recognition. However, most existing approaches encode the skeleton sequences as entangled spatiotemporal representations and confine the contrasts to the same level of representation. Instead, this paper introduces a novel contrastive learning framework, namely Spatiotemporal Clues Disentanglement Network (SCD-Net). Specifically, we integrate the decoupling module with a feature extractor to derive explicit clues from spatial and temporal domains respectively. As for the training of SCD-Net, with a constructed global anchor, we encourage the interaction between the anchor and extracted clues. Further, we propose a new masking strategy with structural constraints to strengthen the contextual associations, leveraging the latest development from masked image modelling into the proposed SCD-Net. We conduct extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Multimodal Machine Learning Applications
MethodsContrastive Learning
