ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation
Liang Xu, Ziyang Song, Dongliang Wang, Jing Su, Zhicheng Fang,, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xiaokang Yang, Wenjun Zeng,, Wei Wu

TL;DR
This paper introduces ActFormer, a GAN-based Transformer model that effectively generates both single and multi-person 3D human motions, leveraging a Gaussian Process prior and extending to complex interactive behaviors.
Contribution
The paper proposes a novel GAN-based Transformer architecture with a Gaussian Process prior for versatile 3D human motion generation, including multi-person interactions.
Findings
Outperforms state-of-the-art on multiple datasets
Successfully models complex multi-person interactions
Extends to multi-person motion with high fidelity
Abstract
We present a GAN-based Transformer for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions. Our approach consists of a powerful Action-conditioned motion TransFormer (ActFormer) under a GAN training scheme, equipped with a Gaussian Process latent prior. Such a design combines the strong spatio-temporal representation capacity of Transformer, superiority in generative modeling of GAN, and inherent temporal correlations from the latent prior. Furthermore, ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders. To further facilitate research on multi-person motion generation, we introduce a new synthetic dataset of complex multi-person combat behaviors. Extensive experiments on NTU-13, NTU RGB+D 120, BABEL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection · Dropout · Layer Normalization · Gaussian Process
