From Diffusion to Flow: Efficient Motion Generation in MotionGPT3
Jaymin Ban, JiHong Jeon, SangYeop Jeong

TL;DR
This paper empirically compares diffusion and rectified flow objectives in MotionGPT3 for text-driven motion generation, finding rectified flow offers faster convergence and comparable or better motion quality.
Contribution
It provides a controlled empirical study demonstrating the advantages of rectified flow over diffusion in motion generation, emphasizing the impact of the training objective.
Findings
Rectified flow converges faster and reaches strong performance earlier.
Flow-based priors are stable across various inference step counts.
Flow-based methods achieve competitive quality with fewer sampling steps.
Abstract
Recent text-driven motion generation methods span both discrete token-based approaches and continuous-latent formulations. MotionGPT3 exemplifies the latter paradigm, combining a learned continuous motion latent space with a diffusion-based prior for text-conditioned synthesis. While rectified flow objectives have recently demonstrated favorable convergence and inference-time properties relative to diffusion in image and audio generation, it remains unclear whether these advantages transfer cleanly to the motion generation setting. In this work, we conduct a controlled empirical study comparing diffusion and rectified flow objectives within the MotionGPT3 framework. By holding the model architecture, training protocol, and evaluation setup fixed, we isolate the effect of the generative objective on training dynamics, final performance, and inference efficiency. Experiments on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
