PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting
Thomas Lucas, Fabien Baradel, Philippe Weinzaepfel, Gr\'egory Rogez

TL;DR
PoseGPT introduces a transformer-based model that generates and forecasts 3D human motion sequences by compressing motions into discrete latent spaces, enabling flexible, long-range, and action-conditioned predictions with state-of-the-art accuracy.
Contribution
The paper presents PoseGPT, a novel approach that combines motion compression with GPT-like modeling for versatile and improved human motion generation and forecasting.
Findings
Achieves state-of-the-art results on multiple datasets.
Effectively models long-range human motion sequences.
Handles arbitrary-length observations, including none.
Abstract
We address the problem of action-conditioned generation of human motion sequences. Existing work falls into two categories: forecast models conditioned on observed past motions, or generative models conditioned on action labels and duration only. In contrast, we generate motion conditioned on observations of arbitrary length, including none. To solve this generalized problem, we propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences. An auto-encoder first maps human motion to latent index sequences in a discrete space, and vice-versa. Inspired by the Generative Pretrained Transformer (GPT), we propose to train a GPT-like model for next-index prediction in that space; this allows PoseGPT to output distributions on possible futures, with or without conditioning on past motion. The discrete and compressed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Adam · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding
