FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot   Video Editing

Lingling Cai; Kang Zhao; Hangjie Yuan; Yingya Zhang; Shiwei Zhang,; Kejie Huang

arXiv:2409.20500·cs.CV·October 1, 2024

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing

Lingling Cai, Kang Zhao, Hangjie Yuan, Yingya Zhang, Shiwei Zhang,, Kejie Huang

PDF

Open Access

TL;DR

FreeMask introduces a novel mask selection method for zero-shot video editing that improves semantic fidelity and temporal consistency by addressing variability in cross-attention masks across models and timesteps.

Contribution

The paper proposes Mask Matching Cost (MMC) and FreeMask, a new approach for selecting optimal attention masks, enhancing zero-shot video editing without additional control or fine-tuning.

Findings

01

FreeMask outperforms state-of-the-art methods in semantic fidelity.

02

It improves temporal consistency and editing quality.

03

The approach is adaptable to existing frameworks without extra parameters.

Abstract

Text-to-video diffusion models have made remarkable advancements. Driven by their ability to generate temporally coherent videos, research on zero-shot video editing using these fundamental models has expanded rapidly. To enhance editing quality, structural controls are frequently employed in video editing. Among these techniques, cross-attention mask control stands out for its effectiveness and efficiency. However, when cross-attention masks are naively applied to video editing, they can introduce artifacts such as blurring and flickering. Our experiments uncover a critical factor overlooked in previous video editing research: cross-attention masks are not consistently clear but vary with model structure and denoising timestep. To address this issue, we propose the metric Mask Matching Cost (MMC) that quantifies this variability and propose FreeMask, a method for selecting optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsSoftmax · Attention Is All You Need · Diffusion