Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements
Genki Kinoshita, Shu Nakamura, Ryo Kawahara, Shohei Nobuhara, Yasutomo Kawanishi, Ko Nishino

TL;DR
This paper introduces A4Mer, a hierarchical, self-supervised Transformer model that learns meaningful human body movement representations called Action Motifs from pose data, improving various behavior modeling tasks.
Contribution
The paper proposes a novel nested Transformer architecture for hierarchical human movement representation and introduces the AMD dataset with foot-mounted cameras for occlusion-robust annotations.
Findings
A4Mer effectively captures Action Motifs from pose sequences.
Hierarchical representations improve action recognition, motion prediction, and interpolation.
The AMD dataset provides extensive multi-view human behavior data with full SMPL annotations.
Abstract
Effective human behavior modeling requires a representation of the human body movement that capitalizes on its compositionality. We propose a hierarchical representation consisting of Action Atoms that capture the atomic joint movements and Action Motifs that are formed by their temporal compositions and encode similar body movements found across different overall human actions. We derive A4Mer, a nested latent Transformer to learn this hierarchical representation from human pose data in a fully self-supervised manner. A4Mer splits a 3D pose sequence into variable-length segments and represents each segment as a single latent token (Action Atoms). Through bottom-up representation learning, temporal patterns composed of these Action Atoms, which capture meaningful temporal spans of reusable, semantic segments of body movements, naturally emerge (Action Motifs). A4Mer achieves this with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
