Object-Centric Diffusion for Efficient Video Editing

Kumara Kahatapitiya; Adil Karjauv; Davide Abati; Fatih Porikli; Yuki; M. Asano; Amirhossein Habibian

arXiv:2401.05735·cs.CV·September 2, 2024·1 cites

Object-Centric Diffusion for Efficient Video Editing

Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki, M. Asano, Amirhossein Habibian

PDF

Open Access

TL;DR

This paper introduces Object-Centric Diffusion, a method that significantly speeds up video editing by focusing computational resources on important regions, reducing latency up to 10 times while maintaining quality.

Contribution

It proposes two novel techniques, Object-Centric Sampling and Token Merging, that improve efficiency and artifact correction in diffusion-based video editing without retraining models.

Findings

01

Achieves up to 10x latency reduction with comparable quality.

02

Effectively reduces memory and computational costs.

03

Applicable to existing models without retraining.

Abstract

Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we conduct an analysis of such inefficiencies, and suggest simple yet effective modifications that allow significant speed-ups whilst maintaining quality. Moreover, we introduce Object-Centric Diffusion, to fix generation artifacts and further reduce latency by allocating more computations towards foreground edited regions, arguably more important for perceptual quality. We achieve this by two novel proposals: i) Object-Centric Sampling, decoupling the diffusion steps spent on salient or background…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies

MethodsKnowledge Distillation · Diffusion · Overfitting Conditional Diffusion Model