Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

Ziyun Zeng; Yiqi Lin; Guoqiang Liang; Mike Zheng Shou

arXiv:2605.06535·cs.CV·May 8, 2026

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

Ziyun Zeng, Yiqi Lin, Guoqiang Liang, Mike Zheng Shou

PDF

2 Repos 1 Models 2 Datasets

TL;DR

Sparkle introduces a new large-scale dataset and benchmark for instruction-guided video background replacement, enabling more realistic and temporally consistent scene synthesis.

Contribution

We develop a scalable pipeline for high-quality background guidance data generation and create Sparkle, the largest dataset and benchmark for this task.

Findings

01

Our dataset and model outperform existing baselines on evaluation benchmarks.

02

Sparkle achieves more natural and temporally consistent background replacements.

03

The decoupled guidance approach improves data quality and model performance.

Abstract

In recent years, open-source efforts like Senorita-2M have propelled video editing toward natural language instruction. However, current publicly available datasets predominantly focus on local editing or style transfer, which largely preserve the original scene structure and are easier to scale. In contrast, Background Replacement, a task central to creative applications such as film production and advertising, requires synthesizing entirely new, temporally consistent scenes while maintaining accurate foreground-background interactions, making large-scale data generation significantly more challenging. Consequently, this complex task remains largely underexplored due to a scarcity of high-quality training data. This gap is evident in poorly performing state-of-the-art models, e.g., Kiwi-Edit, because the primary open-source dataset that contains this task, i.e., OpenVE-3M, frequently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
stdKonjac/Kiwi-Sparkle-720P-81F
model· ♡ 2
♡ 2

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.