McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning
Qiushi Yang, Yingjie Chen, Yuan Yao, Yifang Men, Huaizhuo Liu, Miaomiao Cui

TL;DR
This paper introduces McSc, a reinforcement learning framework that improves text-to-video generation by better modeling human preferences, especially in motion dynamics, through hierarchical reasoning and dynamic bias mitigation.
Contribution
The paper proposes a novel three-stage reinforcement learning approach with hierarchical reasoning and motion correction to enhance preference alignment in video generation.
Findings
McSc outperforms existing methods in human preference alignment.
Generated videos exhibit higher motion dynamics and visual quality.
The framework effectively mitigates bias towards low-motion content.
Abstract
Text-to-video (T2V) generation has achieved remarkable progress in producing high-quality videos aligned with textual prompts. However, aligning synthesized videos with nuanced human preference remains challenging due to the subjective and multifaceted nature of human judgment. Existing video preference alignment methods rely on costly human annotations or utilize proxy metrics to predict preference, which lacks the understanding of human preference logic. Moreover, they usually directly align T2V models with the overall preference distribution, ignoring potential conflict dimensions like motion dynamics and visual quality, which may bias models towards low-motion content. To address these issues, we present Motion-corrective alignment with Self-critic hierarchical Reasoning (McSc), a three-stage reinforcement learning framework for robust preference modeling and alignment. Firstly,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
