T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

Taeryung Lee; Fabien Baradel; Thomas Lucas; Kyoung Mu Lee; and Gregory; Rogez

arXiv:2406.00636·cs.CV·June 4, 2024

T2LM: Long-Term 3D Human Motion Generation from Multiple Sentences

Taeryung Lee, Fabien Baradel, Thomas Lucas, Kyoung Mu Lee, and Gregory, Rogez

PDF

Open Access

TL;DR

T2LM is a novel framework for long-term 3D human motion generation from multiple sentences, using a non-sequential training approach with a VQVAE and Transformer to produce smooth, realistic motion sequences.

Contribution

It introduces a non-sequential, continuous generation method combining VQVAE and Transformer models, enabling long-term motion synthesis without sequential training data.

Findings

01

Outperforms previous long-term motion generation models.

02

Produces smoother and more realistic motion sequences.

03

Competitive with state-of-the-art single-action models.

Abstract

In this paper, we address the challenging problem of long-term 3D human motion generation. Specifically, we aim to generate a long sequence of smoothly connected actions from a stream of multiple sentences (i.e., paragraph). Previous long-term motion generating approaches were mostly based on recurrent methods, using previously generated motion chunks as input for the next step. However, this approach has two drawbacks: 1) it relies on sequential datasets, which are expensive; 2) these methods yield unrealistic gaps between motions generated at each step. To address these issues, we introduce simple yet effective T2LM, a continuous long-term generation framework that can be trained without sequential data. T2LM comprises two components: a 1D-convolutional VQVAE, trained to compress motion to sequences of latent vectors, and a Transformer-based Text Encoder that predicts a latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Analysis and Summarization

MethodsVQ-VAE