TL;DR
This paper introduces MS$^2$L, a multi-task self-supervised learning framework that combines motion prediction, jigsaw puzzle recognition, and contrastive learning to improve skeleton-based action recognition.
Contribution
It proposes a novel multi-task self-supervised approach that enhances feature generalization for action recognition from skeleton data.
Findings
Achieves superior performance on NW-UCLA, NTU RGB+D, and PKUMMD datasets.
Effectively learns discriminative features across different training settings.
Demonstrates the benefit of multi-task learning in self-supervised skeleton action recognition.
Abstract
In this paper, we address self-supervised representation learning from human skeletons for action recognition. Previous methods, which usually learn feature presentations from a single reconstruction task, may come across the overfitting problem, and the features are not generalizable for action recognition. Instead, we propose to integrate multiple tasks to learn more general representations in a self-supervised manner. To realize this goal, we integrate motion prediction, jigsaw puzzle recognition, and contrastive learning to learn skeleton features from different aspects. Skeleton dynamics can be modeled through motion prediction by predicting the future sequence. And temporal patterns, which are critical for action recognition, are learned through solving jigsaw puzzles. We further regularize the feature space by contrastive learning. Besides, we explore different training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning · Jigsaw
