Improving Video Generation with Human Feedback

Jie Liu; Gongye Liu; Jiajun Liang; Ziyang Yuan; Xiaokun Liu; Mingwu Zheng; Xiele Wu; Qiulin Wang; Menghan Xia; Xintao Wang; Xiaohong Liu; Fei Yang; Pengfei Wan; Di Zhang; Kun Gai; Yujiu Yang; Wanli Ouyang

arXiv:2501.13918·cs.CV·October 28, 2025

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang

PDF

Open Access 3 Models 1 Datasets 1 Video

TL;DR

This paper enhances video generation quality by integrating human feedback through a new dataset, a reward model, and three alignment algorithms, leading to more realistic and personalized videos.

Contribution

It introduces a large-scale human preference dataset, a multi-dimensional reward model, and three novel algorithms for aligning flow-based video generation models with human preferences.

Findings

01

VideoReward outperforms existing reward models.

02

Flow-DPO surpasses other training strategies in performance.

03

Flow-NRG enables personalized video quality adjustments.

Abstract

Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models. These include two training-time strategies: direct preference optimization for flow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

KlingTeam/VideoGen-RewardBench
dataset· 129 dl
129 dl

Videos

Improving Video Generation with Human Feedback· slideslive

Taxonomy

TopicsData Visualization and Analytics

MethodsDiffusion