Hierarchically Self-Supervised Transformer for Human Skeleton   Representation Learning

Yuxiao Chen; Long Zhao; Jianbo Yuan; Yu Tian; Zhaoyang Xia; Shijie; Geng; Ligong Han; and Dimitris N. Metaxas

arXiv:2207.09644·cs.CV·March 28, 2023

Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie, Geng, Ligong Han, and Dimitris N. Metaxas

PDF

Open Access 1 Repo

TL;DR

This paper introduces Hi-TRS, a hierarchical Transformer-based model that employs self-supervised pre-training to effectively learn spatial and temporal features of human skeleton sequences, improving performance across multiple downstream tasks.

Contribution

It proposes a novel hierarchical self-supervised pre-training scheme integrated into a Transformer encoder to explicitly model multi-level dependencies in skeleton sequences.

Findings

01

Achieves state-of-the-art results on action recognition, detection, and motion prediction.

02

Demonstrates strong transferability of learned representations across tasks.

03

Outperforms existing contrastive learning methods in skeleton sequence modeling.

Abstract

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult. Recent studies focus on learning video-level temporal and discriminative information using contrastive learning, but overlook the hierarchical spatial-temporal nature of human skeletons. Different from such superficial supervision at the video level, we propose a self-supervised hierarchical pre-training scheme incorporated into a hierarchical Transformer-based skeleton sequence encoder (Hi-TRS), to explicitly capture spatial, short-term, and long-term temporal dependencies at frame, clip, and video levels, respectively. To evaluate the proposed self-supervised pre-training scheme with Hi-TRS, we conduct extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuxiaochen1103/Hi-TRS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Gait Recognition and Analysis