Action-Conditioned 3D Human Motion Synthesis with Transformer VAE
Mathis Petrovich, Michael J. Black, G\"ul Varol

TL;DR
This paper introduces ACTOR, a Transformer-based VAE model for generating diverse, action-conditioned 3D human motion sequences without initial poses, improving over previous methods and enabling applications like data augmentation and motion denoising.
Contribution
The paper proposes a novel Transformer-based VAE architecture for action-conditioned 3D human motion synthesis, capable of generating variable-length sequences without initial poses.
Findings
Outperforms state-of-the-art on NTU RGB+D, HumanAct12, UESTC datasets.
Enables data augmentation to improve action recognition.
Provides effective motion denoising capabilities.
Abstract
We tackle the problem of action-conditioned generation of realistic and diverse human motion sequences. In contrast to methods that complete, or extend, motion sequences, this task does not require an initial pose or sequence. Here we learn an action-aware latent representation for human motions by training a generative variational autoencoder (VAE). By sampling from this latent space and querying a certain duration through a series of positional encodings, we synthesize variable-length motion sequences conditioned on a categorical action. Specifically, we design a Transformer-based architecture, ACTOR, for encoding and decoding a sequence of parametric SMPL human body models estimated from action recognition datasets. We evaluate our approach on the NTU RGB+D, HumanAct12 and UESTC datasets and show improvements over the state of the art. Furthermore, we present two use cases: improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
