IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning
Yuanhang Li, Yiren Song, Junzhe Bai, Xinran Liang, Hu Yang, Libiao Jin, Qi Mao

TL;DR
IC-Effect is a novel framework that enables precise, efficient, and temporally consistent video effects editing using in-context learning and a two-stage training process, even with limited data.
Contribution
It introduces a DiT-based in-context learning approach with spatiotemporal sparse tokenization and a two-stage training strategy for high-quality, controllable video effects editing.
Findings
Achieves seamless effect blending with background preservation.
Demonstrates high fidelity with reduced computation.
Outperforms existing methods in quality and efficiency.
Abstract
We propose \textbf{IC-Effect}, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (\eg flames, particles and cartoon characters) while strictly preserving spatial and temporal consistency. Video VFX editing is highly challenging because injected effects must blend seamlessly with the background, the background must remain entirely unchanged, and effect patterns must be learned efficiently from limited paired data. However, existing video editing models fail to satisfy these requirements. IC-Effect leverages the source video as clean contextual conditions, exploiting the contextual learning capability of DiT models to achieve precise background preservation and natural effect injection. A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning via Effect-LoRA, ensures strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications
