SCD-Net: Spatiotemporal Clues Disentanglement Network for   Self-supervised Skeleton-based Action Recognition

Cong Wu; Xiao-Jun Wu; Josef Kittler; Tianyang Xu; Sara Atito; Muhammad; Awais; Zhenhua Feng

arXiv:2309.05834·cs.CV·September 13, 2023·1 cites

SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition

Cong Wu, Xiao-Jun Wu, Josef Kittler, Tianyang Xu, Sara Atito, Muhammad, Awais, Zhenhua Feng

PDF

Open Access

TL;DR

SCD-Net introduces a novel contrastive learning framework that disentangles spatiotemporal clues in skeleton sequences, improving various action recognition tasks with significant performance gains.

Contribution

The paper proposes a new contrastive learning framework with explicit spatiotemporal clue disentanglement and a masking strategy, advancing skeleton-based action recognition.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Effective in action recognition, retrieval, and transfer learning

03

Enhances contextual understanding through structural masking

Abstract

Contrastive learning has achieved great success in skeleton-based action recognition. However, most existing approaches encode the skeleton sequences as entangled spatiotemporal representations and confine the contrasts to the same level of representation. Instead, this paper introduces a novel contrastive learning framework, namely Spatiotemporal Clues Disentanglement Network (SCD-Net). Specifically, we integrate the decoupling module with a feature extractor to derive explicit clues from spatial and temporal domains respectively. As for the training of SCD-Net, with a constructed global anchor, we encourage the interaction between the anchor and extracted clues. Further, we propose a new masking strategy with structural constraints to strengthen the contextual associations, leveraging the latest development from masked image modelling into the proposed SCD-Net. We conduct extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Multimodal Machine Learning Applications

MethodsContrastive Learning