HuViDPO:Enhancing Video Generation through Direct Preference   Optimization for Human-Centric Alignment

Lifan Jiang; Boxi Wu; Jiahui Zhang; Xiaotong Guan; Shuang Chen

arXiv:2502.01690·cs.CV·February 5, 2025

HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment

Lifan Jiang, Boxi Wu, Jiahui Zhang, Xiaotong Guan, Shuang Chen

PDF

Open Access

TL;DR

HuViDPO introduces a novel approach integrating Direct Preference Optimization into text-to-video generation, improving human alignment and video quality through a structured loss function, preference datasets, and a first-frame-conditioned strategy.

Contribution

This work is the first to apply DPO to T2V tasks, developing a new loss function, constructing preference datasets, and employing a first-frame-conditioned strategy for better video generation.

Findings

01

Enhanced alignment of generated videos with human preferences.

02

Improved video quality with reduced training costs.

03

Flexible video generation guided by initial frame information.

Abstract

With the rapid development of AIGC technology, significant progress has been made in diffusion model-based technologies for text-to-image (T2I) and text-to-video (T2V). In recent years, a few studies have introduced the strategy of Direct Preference Optimization (DPO) into T2I tasks, significantly enhancing human preferences in generated images. However, existing T2V generation methods lack a well-formed pipeline with exact loss function to guide the alignment of generated videos with human preferences using DPO strategies. Additionally, challenges such as the scarcity of paired video preference data hinder effective model training. At the same time, the lack of training datasets poses a risk of insufficient flexibility and poor video generation quality in the generated videos. Based on those problems, our work proposes three targeted solutions in sequence. 1) Our work is the first to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Motion and Animation

MethodsSoftmax · Attention Is All You Need · Direct Preference Optimization · Diffusion · ADaptive gradient method with the OPTimal convergence rate · ALIGN