UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

Lifan Jiang; Tianrun Wu; Yuhang Pei; Chenyang Wang; Boxi Wu; Deng Cai

arXiv:2604.15871·cs.CV·April 20, 2026

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

Lifan Jiang, Tianrun Wu, Yuhang Pei, Chenyang Wang, Boxi Wu, Deng Cai

PDF

1 Repo

TL;DR

UniEditBench introduces a unified, cost-effective benchmark for evaluating image and video editing methods, utilizing distilled multimodal language models to align with human judgments.

Contribution

It provides a comprehensive, shared protocol and lightweight evaluators for fair comparison across diverse visual editing tasks.

Findings

01

Distilled evaluators maintain strong agreement with human judgments.

02

The benchmark covers a wide range of editing operations and tasks.

03

Evaluation cost is significantly reduced with lightweight models.

Abstract

The evaluation of visual editing models remains fragmented across methods and modalities. Existing benchmarks are often tailored to specific paradigms, making fair cross-paradigm comparisons difficult, while video editing lacks reliable evaluation benchmarks. Furthermore, common automatic metrics often misalign with human preference, yet directly deploying large multimodal models (MLLMs) as evaluators incurs prohibitive computational and financial costs. We present UniEditBench, a unified benchmark for image and video editing that supports reconstruction-based and instruction-driven methods under a shared protocol. UniEditBench includes a structured taxonomy of nine image operations (Add, Remove, Replace, Change, Stroke-based, Extract, Adjust, Count, Reorder) and eight video operations, with coverage of challenging compositional tasks such as counting and spatial reordering. To enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wesar1/UniEditBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.