FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
Lingling Cai, Kang Zhao, Hangjie Yuan, Yingya Zhang, Shiwei Zhang,, Kejie Huang

TL;DR
FreeMask introduces a novel mask selection method for zero-shot video editing that improves semantic fidelity and temporal consistency by addressing variability in cross-attention masks across models and timesteps.
Contribution
The paper proposes Mask Matching Cost (MMC) and FreeMask, a new approach for selecting optimal attention masks, enhancing zero-shot video editing without additional control or fine-tuning.
Findings
FreeMask outperforms state-of-the-art methods in semantic fidelity.
It improves temporal consistency and editing quality.
The approach is adaptable to existing frameworks without extra parameters.
Abstract
Text-to-video diffusion models have made remarkable advancements. Driven by their ability to generate temporally coherent videos, research on zero-shot video editing using these fundamental models has expanded rapidly. To enhance editing quality, structural controls are frequently employed in video editing. Among these techniques, cross-attention mask control stands out for its effectiveness and efficiency. However, when cross-attention masks are naively applied to video editing, they can introduce artifacts such as blurring and flickering. Our experiments uncover a critical factor overlooked in previous video editing research: cross-attention masks are not consistently clear but vary with model structure and denoising timestep. To address this issue, we propose the metric Mask Matching Cost (MMC) that quantifies this variability and propose FreeMask, a method for selecting optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · Diffusion
