MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in   Video Generation

Haibo Tong; Zhaoyang Wang; Zhaorun Chen; Haonian Ji; Shi Qiu; Siwei; Han; Kexin Geng; Zhongkai Xue; Yiyang Zhou; Peng Xia; Mingyu Ding; Rafael; Rafailov; Chelsea Finn; and Huaxiu Yao

arXiv:2502.01719·cs.CV·February 10, 2025

MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

Haibo Tong, Zhaoyang Wang, Zhaorun Chen, Haonian Ji, Shi Qiu, Siwei, Han, Kexin Geng, Zhongkai Xue, Yiyang Zhou, Peng Xia, Mingyu Ding, Rafael, Rafailov, Chelsea Finn, and Huaxiu Yao

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MJ-BENCH-VIDEO, a comprehensive benchmark for evaluating video generation quality across multiple aspects, and proposes MJ-VIDEO, a new reward model that improves preference judgment accuracy and enhances video alignment.

Contribution

The paper presents a large-scale, fine-grained video preference benchmark and a novel MoE-based reward model that outperforms existing methods in preference assessment and tuning.

Findings

01

MJ-VIDEO achieves 17.58% improvement in overall preference judgment.

02

The benchmark covers five critical aspects with 28 criteria.

03

MJ-VIDEO enhances video alignment in generation tasks.

Abstract

Recent advancements in video generation have significantly improved the ability to synthesize videos from text instructions. However, existing models still struggle with key challenges such as instruction misalignment, content hallucination, safety concerns, and bias. Addressing these limitations, we introduce MJ-BENCH-VIDEO, a large-scale video preference benchmark designed to evaluate video generation across five critical aspects: Alignment, Safety, Fineness, Coherence & Consistency, and Bias & Fairness. This benchmark incorporates 28 fine-grained criteria to provide a comprehensive evaluation of video preference. Building upon this dataset, we propose MJ-VIDEO, a Mixture-of-Experts (MoE)-based video reward model designed to deliver fine-grained reward. MJ-VIDEO can dynamically select relevant experts to accurately judge the preference based on the input text-video pair. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MJ-Bench/MJ-BENCH-VIDEO
dataset· 67 dl
67 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Multimedia Communication and Technology