Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
Jie Du, Xinyu Gong, Qingshan Tan, Wen Li, Yangming Cheng, Weitao Wang, Chenlu Zhan, Suhui Wu, Hao Zhang, Jun Zhang

TL;DR
Reg-DPO introduces a novel training framework for video generation that leverages real videos as preferences, regularizes training with SFT loss, and employs advanced memory techniques to improve quality, stability, and scalability.
Contribution
The paper proposes Reg-DPO, a new method combining GT-Pair and SFT regularization to enhance video generation, addressing data, stability, and scalability challenges.
Findings
Outperforms existing methods on multiple datasets.
Achieves nearly three times higher training capacity.
Delivers superior video quality in I2V and T2V tasks.
Abstract
Recent studies have identified Direct Preference Optimization (DPO) as an efficient and reward-free approach to improving video generation quality. However, existing methods largely follow image-domain paradigms and are mainly developed on small-scale models (approximately 2B parameters), limiting their ability to address the unique challenges of video tasks, such as costly data construction, unstable training, and heavy memory consumption. To overcome these limitations, we introduce a GT-Pair that automatically builds high-quality preference pairs by using real videos as positives and model-generated videos as negatives, eliminating the need for any external annotation. We further present Reg-DPO, which incorporates the SFT loss as a regularization term into the DPO loss to enhance training stability and generation fidelity. Additionally, by combining the FSDP framework with multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Stochastic Gradient Optimization Techniques
