A Unified Conditional Flow for Motion Generation, Editing, and Intra-Structural Retargeting
Junlin Li, Xinhao Song, Siqi Wang, Haibin Huang, Yili Zhao

TL;DR
This paper introduces a unified generative framework using conditional flow matching for motion generation, editing, and retargeting, enabling versatile and consistent motion synthesis from text and skeletal inputs.
Contribution
It presents a single model that unifies motion editing and retargeting tasks through conditional transport, leveraging flow matching and a transformer architecture.
Findings
Supports text-to-motion generation, editing, and retargeting with one model.
Outperforms task-specific baselines in structural consistency.
Enables zero-shot motion editing and retargeting.
Abstract
Text-driven motion editing and intra-structural retargeting, where source and target share topology but may differ in bone lengths, are traditionally handled by fragmented pipelines with incompatible inputs and representations: editing relies on specialized generative steering, while retargeting is deferred to geometric post-processing. We present a unifying perspective where both tasks are cast as instances of conditional transport within a single generative framework. By leveraging recent advances in flow matching, we demonstrate that editing and retargeting are fundamentally the same generative task, distinguished only by which conditioning signal, semantic or structural, is modulated during inference. We implement this vision via a rectified-flow motion model jointly conditioned on text prompts and target skeletal structures. Our architecture extends a DiT-style transformer with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
