VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
Jiahao Hu, Tianxiong Zhong, Xuebo Wang, Boyuan Jiang, Xingye Tian, Fei Yang, Pengfei Wan, Di Zhang

TL;DR
This paper introduces VIVID-10M, a large-scale dataset, and VIVID, a versatile, interactive video editing model that enables efficient, high-quality local editing with improved user interactivity and state-of-the-art performance.
Contribution
The paper presents the first large-scale hybrid video editing dataset and a new interactive editing model supporting entity addition, modification, and deletion.
Findings
VIVID-10M contains 9.7 million samples covering diverse editing tasks.
VIVID model achieves state-of-the-art results in video local editing.
Interactive keyframe-guided editing reduces latency and improves user control.
Abstract
Diffusion-based image editing models have made remarkable progress in recent years. However, achieving high-quality video editing remains a significant challenge. One major hurdle is the absence of open-source, large-scale video editing datasets based on real-world data, as constructing such datasets is both time-consuming and costly. Moreover, video data requires a significantly larger number of tokens for representation, which substantially increases the training costs for video editing models. Lastly, current video editing models offer limited interactivity, often making it difficult for users to express their editing requirements effectively in a single attempt. To address these challenges, this paper introduces a dataset VIVID-10M and a baseline model VIVID. VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Cell Image Analysis Techniques · Advanced Vision and Imaging
