OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

Haoyang He; Jie Wang; Jiangning Zhang; Zhucun Xue; Xingyuan Bu; Qiangpeng Yang; Shilei Wen; Lei Xie

arXiv:2512.07826·cs.CV·December 17, 2025

OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing

Haoyang He, Jie Wang, Jiangning Zhang, Zhucun Xue, Xingyuan Bu, Qiangpeng Yang, Shilei Wen, Lei Xie

PDF

Open Access 1 Datasets

TL;DR

OpenVE-3M introduces a comprehensive, large-scale dataset for instruction-guided video editing, enabling improved model training and benchmarking in this emerging field.

Contribution

We created OpenVE-3M, the first large-scale, high-quality dataset for instruction-based video editing, and established OpenVE-Bench for standardized evaluation.

Findings

01

OpenVE-3M surpasses existing datasets in scale and diversity.

02

OpenVE-Edit model achieves state-of-the-art results on OpenVE-Bench.

03

Our dataset and benchmark facilitate future research in instruction-guided video editing.

Abstract

The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based video editing remain scarce. To address this gap, we introduce OpenVE-3M, an open-source, large-scale, and high-quality dataset for instruction-based video editing. It comprises two primary categories: spatially-aligned edits (Global Style, Background Change, Local Change, Local Remove, Local Add, and Subtitles Edit) and non-spatially-aligned edits (Camera Multi-Shot Edit and Creative Edit). All edit types are generated via a meticulously designed data pipeline with rigorous quality filtering. OpenVE-3M surpasses existing open-source datasets in terms of scale, diversity of edit types, instruction length, and overall quality. Furthermore, to address the lack of a unified benchmark in the field, we construct OpenVE-Bench,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Lewandofski/OpenVE-3M
dataset· 8.0k dl
8.0k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization