VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

Xiangbo Gao; Sicong Jiang; Bangya Liu; Xinghao Chen; Minglai Yang; Siyuan Yang; Mingyang Wu; Jiongze Yu; Qi Zheng; Haozhi Wang; Jiayi Zhang; Jie Yang; Zihan Wang; Qing Yin; and Zhengzhong Tu

arXiv:2604.16272·cs.CV·April 21, 2026

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

Xiangbo Gao, Sicong Jiang, Bangya Liu, Xinghao Chen, Minglai Yang, Siyuan Yang, Mingyang Wu, Jiongze Yu, Qi Zheng, Haozhi Wang, Jiayi Zhang, Jie Yang, Zihan Wang, Qing Yin, and Zhengzhong Tu

PDF

1 Repo 1 Models

TL;DR

VEFX-Bench introduces a comprehensive dataset, a specialized reward model, and a benchmark for evaluating AI-assisted video editing systems, addressing the lack of standardized evaluation tools.

Contribution

The paper presents VEFX-Dataset, VEFX-Reward, and VEFX-Bench, enabling standardized, human-aligned assessment of video editing quality and system performance.

Findings

01

VEFX-Reward correlates better with human judgments than existing models.

02

Benchmarking reveals current systems struggle with visual plausibility and instruction adherence.

03

The dataset covers 5,049 examples across 9 editing categories with detailed labels.

Abstract

As AI-assisted video creation becomes increasingly practical, instruction-guided video editing has become essential for refining generated or captured footage to meet professional requirements. Yet the field still lacks both a large-scale human-annotated dataset with complete editing examples and a standardized evaluator for comparing editing systems. Existing resources are limited by small scale, missing edited outputs, or the absence of human quality labels, while current evaluation often relies on expensive manual inspection or generic vision-language model judges that are not specialized for editing quality. We introduce VEFX-Dataset, a human-annotated dataset containing 5,049 video editing examples across 9 major editing categories and 32 subcategories, each labeled along three decoupled dimensions: Instruction Following, Rendering Quality, and Edit Exclusivity. Building on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://xiangbogaobarry.github.io/VEFX-Bench
github

Models

🤗
viskoplatform/VEFX-Reward-32B
model· 40 dl· ♡ 3
40 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.