X-MoGen: Unified Motion Generation across Humans and Animals

Xuan Wang; Kai Ruan; Liyang Qian; Zhizhi Guo; Chang Su; Gaoang Wang

arXiv:2508.05162·cs.CV·November 18, 2025

X-MoGen: Unified Motion Generation across Humans and Animals

Xuan Wang, Kai Ruan, Liyang Qian, Zhizhi Guo, Chang Su, Gaoang Wang

PDF

TL;DR

X-MoGen introduces a unified framework for cross-species text-driven motion generation, enabling realistic motion synthesis for both humans and animals by addressing morphological differences with a novel two-stage architecture and a large-scale dataset.

Contribution

It is the first unified approach to generate cross-species motion from text, combining a novel architecture with a comprehensive dataset for joint training.

Findings

01

Outperforms existing methods on seen and unseen species

02

Achieves high skeletal plausibility across diverse species

03

Demonstrates effective generalization in motion generation

Abstract

Text-driven motion generation has attracted increasing attention due to its broad applications in virtual reality, animation, and robotics. While existing methods typically model human and animal motion separately, a joint cross-species approach offers key advantages, such as a unified representation and improved generalization. However, morphological differences across species remain a key challenge, often compromising motion plausibility. To address this, we propose X-MoGen, the first unified framework for cross-species text-driven motion generation covering both humans and animals. X-MoGen adopts a two-stage architecture. First, a conditional graph variational autoencoder learns canonical T-pose priors, while an autoencoder encodes motion into a shared latent space regularized by morphological loss. In the second stage, we perform masked motion modeling to generate motion embeddings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.