ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D   Human Motion Generation

Liang Xu; Ziyang Song; Dongliang Wang; Jing Su; Zhicheng Fang,; Chenjing Ding; Weihao Gan; Yichao Yan; Xin Jin; Xiaokang Yang; Wenjun Zeng,; Wei Wu

arXiv:2203.07706·cs.CV·November 24, 2022

ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation

Liang Xu, Ziyang Song, Dongliang Wang, Jing Su, Zhicheng Fang,, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xiaokang Yang, Wenjun Zeng,, Wei Wu

PDF

Open Access

TL;DR

This paper introduces ActFormer, a GAN-based Transformer model that effectively generates both single and multi-person 3D human motions, leveraging a Gaussian Process prior and extending to complex interactive behaviors.

Contribution

The paper proposes a novel GAN-based Transformer architecture with a Gaussian Process prior for versatile 3D human motion generation, including multi-person interactions.

Findings

01

Outperforms state-of-the-art on multiple datasets

02

Successfully models complex multi-person interactions

03

Extends to multi-person motion with high fidelity

Abstract

We present a GAN-based Transformer for general action-conditioned 3D human motion generation, including not only single-person actions but also multi-person interactive actions. Our approach consists of a powerful Action-conditioned motion TransFormer (ActFormer) under a GAN training scheme, equipped with a Gaussian Process latent prior. Such a design combines the strong spatio-temporal representation capacity of Transformer, superiority in generative modeling of GAN, and inherent temporal correlations from the latent prior. Furthermore, ActFormer can be naturally extended to multi-person motions by alternately modeling temporal correlations and human interactions with Transformer encoders. To further facilitate research on multi-person motion generation, we introduce a new synthetic dataset of complex multi-person combat behaviors. Extensive experiments on NTU-13, NTU RGB+D 120, BABEL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection · Dropout · Layer Normalization · Gaussian Process