Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning
Jianfeng Dong, Shengkai Sun, Zhonglin Liu, Shujie Chen, Baolong Liu,, Xun Wang

TL;DR
This paper introduces HiCo, a hierarchical contrastive learning framework for unsupervised skeleton-based action representation, leveraging multi-level features and hierarchical contrast to improve action recognition and retrieval.
Contribution
The paper proposes a novel hierarchical contrastive learning method that captures multi-level features from skeleton sequences, enhancing unsupervised action representation learning.
Findings
Achieves state-of-the-art results on four datasets.
Effective for both unsupervised and semi-supervised tasks.
Learned representations transfer well to downstream tasks.
Abstract
This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing contrastive-based solutions that typically represent an input skeleton sequence into instance-level features and perform contrast holistically, our proposed HiCo represents the input into multiple-level features and performs contrast in a hierarchical manner. Specifically, given a human skeleton sequence, we represent it into multiple feature vectors of different granularities from both temporal and spatial domains via sequence-to-sequence (S2S) encoders and unified downsampling modules. Besides, the hierarchical contrast is conducted in terms of four levels: instance level, domain level, clip level, and part level. Moreover, HiCo is orthogonal to the S2S encoder, which allows us to flexibly embrace state-of-the-art S2S…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Stroke Rehabilitation and Recovery
MethodsContrastive Language-Image Pre-training
