InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing
Anant Khandelwal

TL;DR
InFusion is a zero-shot video editing framework that uses large pre-trained image diffusion models to enable multi-concept editing with pixel-level control, ensuring temporal consistency without additional training.
Contribution
The paper introduces InFusion, a novel method for zero-shot, multi-concept video editing using feature and attention injection, without requiring model training.
Findings
Effective multi-concept editing with temporal consistency.
Compatible with existing diffusion models like Stable Diffusion v1.5.
Achieves high-quality, coherent video edits in experiments.
Abstract
Large text-to-image diffusion models have achieved remarkable success in generating diverse, high-quality images. Additionally, these models have been successfully leveraged to edit input images by just changing the text prompt. But when these models are applied to videos, the main challenge is to ensure temporal consistency and coherence across frames. In this paper, we propose InFusion, a framework for zero-shot text-based video editing leveraging large pre-trained image diffusion models. Our framework specifically supports editing of multiple concepts with pixel-level control over diverse concepts mentioned in the editing prompt. Specifically, we inject the difference in features obtained with source and edit prompts from U-Net residual blocks of decoder layers. When these are combined with injected attention features, it becomes feasible to query the source contents and scale edited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Diffusion · Max Pooling · U-Net
