VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu; Haoyu Wu; Zheng Ziqiang; Chen Wei; Yingqing He; Renjie Pi,; Qifeng Chen

arXiv:2412.14167·cs.CV·December 19, 2024

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

Runtao Liu, Haoyu Wu, Zheng Ziqiang, Chen Wei, Yingqing He, Renjie Pi,, Qifeng Chen

PDF

Open Access 1 Datasets

TL;DR

VideoDPO introduces a comprehensive preference alignment method for video diffusion models, improving both visual quality and semantic accuracy by adapting direct preference optimization with an omni-score that balances multiple preference aspects.

Contribution

It pioneers the adaptation of DPO to video models, creating an omni-score for balanced preference alignment, and develops an automatic data collection pipeline for training.

Findings

01

Enhanced visual quality in generated videos

02

Improved semantic alignment with text prompts

03

Re-weighting preference pairs boosts overall alignment

Abstract

Recent progress in generative diffusion models has greatly advanced text-to-video generation. While text-to-video models trained on large-scale, diverse datasets can produce varied outputs, these generations often deviate from user preferences, highlighting the need for preference alignment on pre-trained models. Although Direct Preference Optimization (DPO) has demonstrated significant improvements in language and image generation, we pioneer its adaptation to video diffusion models and propose a VideoDPO pipeline by making several key adjustments. Unlike previous image alignment methods that focus solely on either (i) visual quality or (ii) semantic alignment between text and videos, we comprehensively consider both dimensions and construct a preference score accordingly, which we term the OmniScore. We design a pipeline to automatically collect preference pair data based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

chungimungi/VideoDPO-10k
dataset· 288 dl
288 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Multimedia Communication and Technology · Advanced Vision and Imaging

MethodsDiffusion · Focus