Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model
Yuan Wang, Borui Liao, Huijuan Huang, Jinda Lu, Ouxiang Li, Kuien Liu, Meng Wang, Xiang Wang

TL;DR
This paper introduces REACT, a frame-level reward model for evaluating structural distortions in generative videos, supported by a large-scale annotated dataset and a two-stage training process, improving assessment accuracy and interpretability.
Contribution
The paper presents REACT, a novel frame-level reward model specifically designed for structural distortion evaluation in generative videos, along with a large-scale annotated dataset and a new benchmarking tool.
Findings
REACT effectively detects structural distortions in videos.
REACT achieves high correlation with human preferences.
The benchmark REACT-Bench provides a standard for distortion evaluation.
Abstract
Recent advances in video reward models and post-training strategies have improved text-to-video (T2V) generation. While these models typically assess visual quality, motion quality, and text alignment, they often overlook key structural distortions, such as abnormal object appearances and interactions, which can degrade the overall quality of the generative video. To address this gap, we introduce REACT, a frame-level reward model designed specifically for structural distortions evaluation in generative videos. REACT assigns point-wise scores and attribution labels by reasoning over video frames, focusing on recognizing distortions. To support this, we construct a large-scale human preference dataset, annotated based on our proposed taxonomy of structural distortions, and generate additional data using a efficient Chain-of-Thought (CoT) synthesis pipeline. REACT is trained with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
