Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
Nathaniel Cohen, Vladimir Kulikov, Matan Kleiner, Inbar, Huberman-Spiegelglas, Tomer Michaeli

TL;DR
Slicedit introduces a novel video editing approach using spatiotemporal slices and pretrained text-to-image diffusion models to achieve consistent, structure-preserving edits guided by text, overcoming challenges of nonrigid motion.
Contribution
It proposes a new method that applies T2I diffusion models on spatiotemporal slices for zero-shot video editing, enhancing temporal consistency without explicit correspondence mechanisms.
Findings
Effective editing of real-world videos with preserved motion and structure
Outperforms existing methods in temporal consistency and editing quality
Works across diverse video content and editing tasks
Abstract
Text-to-image (T2I) diffusion models achieve state-of-the-art results in image synthesis and editing. However, leveraging such pretrained models for video editing is considered a major challenge. Many existing works attempt to enforce temporal consistency in the edited video through explicit correspondence mechanisms, either in pixel space or between deep features. These methods, however, struggle with strong nonrigid motion. In this paper, we introduce a fundamentally different approach, which is based on the observation that spatiotemporal slices of natural videos exhibit similar characteristics to natural images. Thus, the same T2I diffusion model that is normally used only as a prior on video frames, can also serve as a strong prior for enhancing temporal consistency by applying it on spatiotemporal slices. Based on this observation, we present Slicedit, a method for text-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
