InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot   Text-based Video Editing

Anant Khandelwal

arXiv:2308.00135·cs.CV·August 11, 2023

InFusion: Inject and Attention Fusion for Multi Concept Zero-Shot Text-based Video Editing

Anant Khandelwal

PDF

Open Access

TL;DR

InFusion is a zero-shot video editing framework that uses large pre-trained image diffusion models to enable multi-concept editing with pixel-level control, ensuring temporal consistency without additional training.

Contribution

The paper introduces InFusion, a novel method for zero-shot, multi-concept video editing using feature and attention injection, without requiring model training.

Findings

01

Effective multi-concept editing with temporal consistency.

02

Compatible with existing diffusion models like Stable Diffusion v1.5.

03

Achieves high-quality, coherent video edits in experiments.

Abstract

Large text-to-image diffusion models have achieved remarkable success in generating diverse, high-quality images. Additionally, these models have been successfully leveraged to edit input images by just changing the text prompt. But when these models are applied to videos, the main challenge is to ensure temporal consistency and coherence across frames. In this paper, we propose InFusion, a framework for zero-shot text-based video editing leveraging large pre-trained image diffusion models. Our framework specifically supports editing of multiple concepts with pixel-level control over diverse concepts mentioned in the editing prompt. Specifically, we inject the difference in features obtained with source and edit prompts from U-Net residual blocks of decoder layers. When these are combined with injected attention features, it becomes feasible to query the source contents and scale edited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Vision and Imaging

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Diffusion · Max Pooling · U-Net