OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
Haoyang He, Jie Wang, Jiangning Zhang, Zhucun Xue, Xingyuan Bu, Qiangpeng Yang, Shilei Wen, Lei Xie

TL;DR
OpenVE-3M introduces a comprehensive, large-scale dataset for instruction-guided video editing, enabling improved model training and benchmarking in this emerging field.
Contribution
We created OpenVE-3M, the first large-scale, high-quality dataset for instruction-based video editing, and established OpenVE-Bench for standardized evaluation.
Findings
OpenVE-3M surpasses existing datasets in scale and diversity.
OpenVE-Edit model achieves state-of-the-art results on OpenVE-Bench.
Our dataset and benchmark facilitate future research in instruction-guided video editing.
Abstract
The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based video editing remain scarce. To address this gap, we introduce OpenVE-3M, an open-source, large-scale, and high-quality dataset for instruction-based video editing. It comprises two primary categories: spatially-aligned edits (Global Style, Background Change, Local Change, Local Remove, Local Add, and Subtitles Edit) and non-spatially-aligned edits (Camera Multi-Shot Edit and Creative Edit). All edit types are generated via a meticulously designed data pipeline with rigorous quality filtering. OpenVE-3M surpasses existing open-source datasets in terms of scale, diversity of edit types, instruction length, and overall quality. Furthermore, to address the lack of a unified benchmark in the field, we construct OpenVE-Bench,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
