HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment
Lifan Jiang, Boxi Wu, Jiahui Zhang, Xiaotong Guan, Shuang Chen

TL;DR
HuViDPO introduces a novel approach integrating Direct Preference Optimization into text-to-video generation, improving human alignment and video quality through a structured loss function, preference datasets, and a first-frame-conditioned strategy.
Contribution
This work is the first to apply DPO to T2V tasks, developing a new loss function, constructing preference datasets, and employing a first-frame-conditioned strategy for better video generation.
Findings
Enhanced alignment of generated videos with human preferences.
Improved video quality with reduced training costs.
Flexible video generation guided by initial frame information.
Abstract
With the rapid development of AIGC technology, significant progress has been made in diffusion model-based technologies for text-to-image (T2I) and text-to-video (T2V). In recent years, a few studies have introduced the strategy of Direct Preference Optimization (DPO) into T2I tasks, significantly enhancing human preferences in generated images. However, existing T2V generation methods lack a well-formed pipeline with exact loss function to guide the alignment of generated videos with human preferences using DPO strategies. Additionally, challenges such as the scarcity of paired video preference data hinder effective model training. At the same time, the lack of training datasets poses a risk of insufficient flexibility and poor video generation quality in the generated videos. Based on those problems, our work proposes three targeted solutions in sequence. 1) Our work is the first to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Human Motion and Animation
MethodsSoftmax · Attention Is All You Need · Direct Preference Optimization · Diffusion · ADaptive gradient method with the OPTimal convergence rate · ALIGN
