VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu, Haoyu Wu, Zheng Ziqiang, Chen Wei, Yingqing He, Renjie Pi,, Qifeng Chen

TL;DR
VideoDPO introduces a comprehensive preference alignment method for video diffusion models, improving both visual quality and semantic accuracy by adapting direct preference optimization with an omni-score that balances multiple preference aspects.
Contribution
It pioneers the adaptation of DPO to video models, creating an omni-score for balanced preference alignment, and develops an automatic data collection pipeline for training.
Findings
Enhanced visual quality in generated videos
Improved semantic alignment with text prompts
Re-weighting preference pairs boosts overall alignment
Abstract
Recent progress in generative diffusion models has greatly advanced text-to-video generation. While text-to-video models trained on large-scale, diverse datasets can produce varied outputs, these generations often deviate from user preferences, highlighting the need for preference alignment on pre-trained models. Although Direct Preference Optimization (DPO) has demonstrated significant improvements in language and image generation, we pioneer its adaptation to video diffusion models and propose a VideoDPO pipeline by making several key adjustments. Unlike previous image alignment methods that focus solely on either (i) visual quality or (ii) semantic alignment between text and videos, we comprehensively consider both dimensions and construct a preference score accordingly, which we term the OmniScore. We design a pipeline to automatically collect preference pair data based on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Multimedia Communication and Technology · Advanced Vision and Imaging
MethodsDiffusion · Focus
