MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization
Massimiliano Pappa, Luca Collorone, Giovanni Ficarra, Indro Spinelli,, Fabio Galasso

TL;DR
MoDiPO introduces an AI-feedback-driven approach to align text-to-motion diffusion models, improving realism and controllability while reducing the need for human preference data.
Contribution
The paper presents MoDiPO, a novel method using AI feedback for Direct Preference Optimization to enhance text-to-motion model alignment, with a new motion-preference dataset.
Findings
Significantly improves motion realism as measured by FID
Maintains comparable RPrecision and multi-modality performance
Reduces reliance on human preference data in DPO
Abstract
Diffusion Models have revolutionized the field of human motion generation by offering exceptional generation quality and fine-grained controllability through natural language conditioning. Their inherent stochasticity, that is the ability to generate various outputs from a single input, is key to their success. However, this diversity should not be unrestricted, as it may lead to unlikely generations. Instead, it should be confined within the boundaries of text-aligned and realistic generations. To address this issue, we propose MoDiPO (Motion Diffusion DPO), a novel methodology that leverages Direct Preference Optimization (DPO) to align text-to-motion models. We streamline the laborious and expensive process of gathering human preferences needed in DPO by leveraging AI feedback instead. This enables us to experiment with novel DPO strategies, using both online and offline generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation
MethodsDirect Preference Optimization · ALIGN · Diffusion
