Skeleton2vec: A Self-supervised Learning Framework with Contextualized   Target Representations for Skeleton Sequence

Ruizhuo Xu; Linzhi Huang; Mei Wang; Jiani Hu; Weihong Deng

arXiv:2401.00921·cs.CV·January 3, 2024·2 cites

Skeleton2vec: A Self-supervised Learning Framework with Contextualized Target Representations for Skeleton Sequence

Ruizhuo Xu, Linzhi Huang, Mei Wang, Jiani Hu, Weihong Deng

PDF

Open Access 1 Repo

TL;DR

Skeleton2vec introduces a self-supervised learning framework for skeleton-based action recognition that leverages high-level contextualized features and a motion-aware masking strategy, resulting in superior performance on benchmark datasets.

Contribution

It proposes a novel transformer-based teacher encoder for contextualized target representations and a motion-aware tube masking strategy to enhance spatio-temporal learning.

Findings

01

Outperforms previous methods on NTU-60, NTU-120, and PKU-MMD datasets.

02

Achieves state-of-the-art results in skeleton-based action recognition.

03

Utilizes high-level features and motion priors for improved self-supervised learning.

Abstract

Self-supervised pre-training paradigms have been extensively explored in the field of skeleton-based action recognition. In particular, methods based on masked prediction have pushed the performance of pre-training to a new height. However, these methods take low-level features, such as raw joint coordinates or temporal motion, as prediction targets for the masked regions, which is suboptimal. In this paper, we show that using high-level contextualized features as prediction targets can achieve superior performance. Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework, which utilizes a transformer-based teacher encoder taking unmasked training samples as input to create latent contextualized representations as prediction targets. Benefiting from the self-attention mechanism, the latent representations generated by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruizhuo-xu/skeleton2vec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications

MethodsFocus