MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks

Yiming Wu; Wei Ji; Kecheng Zheng; Zicheng Wang; Dong Xu

arXiv:2411.19786·cs.CV·December 2, 2024

MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks

Yiming Wu, Wei Ji, Kecheng Zheng, Zicheng Wang, Dong Xu

PDF

Open Access

TL;DR

MoTe is a unified multi-modal diffusion model that effectively handles various motion and text generation tasks by learning joint, marginal, and conditional distributions, demonstrating superior results on benchmarks.

Contribution

The paper introduces MoTe, a novel model that unifies multiple motion-text tasks within a single framework using diffusion models and multi-modal encoders.

Findings

01

Superior performance on text-to-motion generation

02

Competitive results on motion captioning

03

Effective multi-task learning with a single model

Abstract

Recently, human motion analysis has experienced great improvement due to inspiring generative models such as the denoising diffusion model and large language model. While the existing approaches mainly focus on generating motions with textual descriptions and overlook the reciprocal task. In this paper, we present~\textbf{MoTe}, a unified multi-modal model that could handle diverse tasks by learning the marginal, conditional, and joint distributions of motion and text simultaneously. MoTe enables us to handle the paired text-motion generation, motion captioning, and text-driven motion generation by simply modifying the input context. Specifically, MoTe is composed of three components: Motion Encoder-Decoder (MED), Text Encoder-Decoder (TED), and Moti-on-Text Diffusion Model (MTDM). In particular, MED and TED are trained for extracting latent embeddings, and subsequently reconstructing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsFocus · Diffusion