MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by   Adversarial Training

Kengo Uchida; Takashi Shibuya; Yuhta Takida; Naoki Murata; Julian; Tanke; Shusuke Takahashi; Yuki Mitsufuji

arXiv:2406.01867·cs.CV·April 15, 2025·1 cites

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training

Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Julian, Tanke, Shusuke Takahashi, Yuki Mitsufuji

PDF

Open Access 1 Models

TL;DR

MoLA is a novel framework that combines latent diffusion, adversarial training, and a variational autoencoder to enable fast, high-quality, and controllable text-to-motion generation and editing of variable-length motions.

Contribution

It introduces a unified approach for motion generation and editing using latent diffusion enhanced by adversarial training, with a new motion representation for variable-length outputs.

Findings

01

Adversarial training improves motion generation quality.

02

The framework supports multiple motion editing tasks.

03

Motion generation is faster and more controllable.

Abstract

In text-to-motion generation, controllability as well as generation quality and speed has become increasingly critical. The controllability challenges include generating a motion of a length that matches the given textual description and editing the generated motions according to control signals, such as the start-end positions and the pelvis trajectory. In this paper, we propose MoLA, which provides fast, high-quality, variable-length motion generation and can also deal with multiple editing tasks in a single framework. Our approach revisits the motion representation used as inputs and outputs in the model, incorporating an activation variable to enable variable-length motion generation. Additionally, we integrate a variational autoencoder and a latent diffusion model, further enhanced through adversarial training, to achieve high-quality and fast generation. Moreover, we apply a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Sony/MoLA
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion