Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model

Yuan Wang; Borui Liao; Huijuan Huang; Jinda Lu; Ouxiang Li; Kuien Liu; Meng Wang; Xiang Wang

arXiv:2601.04033·cs.CV·March 27, 2026

Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model

Yuan Wang, Borui Liao, Huijuan Huang, Jinda Lu, Ouxiang Li, Kuien Liu, Meng Wang, Xiang Wang

PDF

Open Access

TL;DR

This paper introduces REACT, a frame-level reward model for evaluating structural distortions in generative videos, supported by a large-scale annotated dataset and a two-stage training process, improving assessment accuracy and interpretability.

Contribution

The paper presents REACT, a novel frame-level reward model specifically designed for structural distortion evaluation in generative videos, along with a large-scale annotated dataset and a new benchmarking tool.

Findings

01

REACT effectively detects structural distortions in videos.

02

REACT achieves high correlation with human preferences.

03

The benchmark REACT-Bench provides a standard for distortion evaluation.

Abstract

Recent advances in video reward models and post-training strategies have improved text-to-video (T2V) generation. While these models typically assess visual quality, motion quality, and text alignment, they often overlook key structural distortions, such as abnormal object appearances and interactions, which can degrade the overall quality of the generative video. To address this gap, we introduce REACT, a frame-level reward model designed specifically for structural distortions evaluation in generative videos. REACT assigns point-wise scores and attribution labels by reasoning over video frames, focusing on recognizing distortions. To support this, we construct a large-scale human preference dataset, annotated based on our proposed taxonomy of structural distortions, and generate additional data using a efficient Chain-of-Thought (CoT) synthesis pipeline. REACT is trained with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization