X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents

Guoxian Song; Hongyi Xu; Xiaochen Zhao; You Xie; Tianpei Gu; Zenan Li; Chenxu Zhang; Linjie Luo

arXiv:2508.09383·cs.CV·August 14, 2025

X-UniMotion: Animating Human Images with Expressive, Unified and Identity-Agnostic Motion Latents

Guoxian Song, Hongyi Xu, Xiaochen Zhao, You Xie, Tianpei Gu, Zenan Li, Chenxu Zhang, Linjie Luo

PDF

TL;DR

X-UniMotion introduces a unified, expressive, and identity-agnostic motion representation for human images, enabling high-fidelity, cross-identity motion transfer across diverse subjects and poses.

Contribution

It proposes a novel implicit latent representation with four disentangled tokens for comprehensive human motion, trained via a self-supervised framework on large datasets.

Findings

01

Outperforms existing methods in motion fidelity and identity preservation.

02

Enables detailed and expressive cross-identity human motion transfer.

03

Uses a self-supervised, end-to-end training approach with auxiliary decoders.

Abstract

We present X-UniMotion, a unified and expressive implicit latent representation for whole-body human motion, encompassing facial expressions, body poses, and hand gestures. Unlike prior motion transfer methods that rely on explicit skeletal poses and heuristic cross-identity adjustments, our approach encodes multi-granular motion directly from a single image into a compact set of four disentangled latent tokens -- one for facial expression, one for body pose, and one for each hand. These motion latents are both highly expressive and identity-agnostic, enabling high-fidelity, detailed cross-identity motion transfer across subjects with diverse identities, poses, and spatial configurations. To achieve this, we introduce a self-supervised, end-to-end framework that jointly learns the motion encoder and latent representation alongside a DiT-based video generative model, trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.