InstructVideo: Instructing Video Diffusion Models with Human Feedback

Hangjie Yuan; Shiwei Zhang; Xiang Wang; Yujie Wei; Tao Feng; Yining; Pan; Yingya Zhang; Ziwei Liu; Samuel Albanie; Dong Ni

arXiv:2312.12490·cs.CV·December 21, 2023·1 cites

InstructVideo: Instructing Video Diffusion Models with Human Feedback

Hangjie Yuan, Shiwei Zhang, Xiang Wang, Yujie Wei, Tao Feng, Yining, Pan, Yingya Zhang, Ziwei Liu, Samuel Albanie, Dong Ni

PDF

Open Access 1 Repo

TL;DR

InstructVideo introduces a human feedback-based fine-tuning approach for text-to-video diffusion models, improving video quality by efficient reward-based editing and repurposing image reward models for better alignment with human preferences.

Contribution

The paper presents a novel reward fine-tuning method that reduces computational costs and leverages image reward models for improved video generation quality.

Findings

01

Enhanced video quality with human feedback fine-tuning

02

Reduced fine-tuning computational cost through partial inference

03

Effective use of image reward models for video preference alignment

Abstract

Diffusion models have emerged as the de facto paradigm for video generation. However, their reliance on web-scale data of varied quality often yields results that are visually unappealing and misaligned with the textual prompts. To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning. InstructVideo has two key ingredients: 1) To ameliorate the cost of reward fine-tuning induced by generating through the full DDIM sampling chain, we recast reward fine-tuning as editing. By leveraging the diffusion process to corrupt a sampled video, InstructVideo requires only partial inference of the DDIM sampling chain, reducing fine-tuning cost while improving fine-tuning efficiency. 2) To mitigate the absence of a dedicated video reward model for human preferences, we repurpose established image reward models, e.g., HPSv2.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ali-vilab/i2vgen-xl
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment

MethodsDiffusion