LS-GAN: Human Motion Synthesis with Latent-space GANs
Avinash Amballa, Gayathri Akkinapalli, Vinitra Muralikrishnan

TL;DR
This paper introduces a latent-space GAN framework for human motion synthesis conditioned on text, achieving faster training and inference with high-quality results comparable to diffusion models.
Contribution
The paper presents a novel latent-space GAN approach for text-conditioned human motion synthesis, reducing computational costs while maintaining high-quality outputs.
Findings
Achieved a FID of 0.482 on benchmarks.
Reduced FLOPs by over 91% compared to diffusion models.
Demonstrated competitive results with state-of-the-art methods.
Abstract
Human motion synthesis conditioned on textual input has gained significant attention in recent years due to its potential applications in various domains such as gaming, film production, and virtual reality. Conditioned Motion synthesis takes a text input and outputs a 3D motion corresponding to the text. While previous works have explored motion synthesis using raw motion data and latent space representations with diffusion models, these approaches often suffer from high training and inference times. In this paper, we introduce a novel framework that utilizes Generative Adversarial Networks (GANs) in the latent space to enable faster training and inference while achieving results comparable to those of the state-of-the-art diffusion methods. We perform experiments on the HumanML3D, HumanAct12 benchmarks and demonstrate that a remarkably simple GAN in the latent space achieves a FID of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis
MethodsSoftmax · Attention Is All You Need · Diffusion
