MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Haoyu Zheng, Wenqiao Zhang, Zheqi Lv, Yu Zhong, Yang Dai, Jianxiang, An, Yongliang Shen, Juncheng Li, Dongping Zhang, Siliang Tang, Yueting Zhuang

TL;DR
MAKIMA is a tuning-free framework for multi-attribute open-domain video editing that leverages mask-guided attention modulation and feature propagation to improve editing precision, consistency, and efficiency without additional fine-tuning.
Contribution
The paper introduces MAKIMA, a novel tuning-free multi-attribute video editing method that uses mask-guided attention modulation and feature propagation based on pretrained text-to-image models.
Findings
Outperforms existing methods in editing accuracy
Achieves superior temporal consistency in videos
Maintains computational efficiency during editing
Abstract
Diffusion-based text-to-image (T2I) models have demonstrated remarkable results in global video editing tasks. However, their focus is primarily on global video modifications, and achieving desired attribute-specific changes remains a challenging task, specifically in multi-attribute editing (MAE) in video. Contemporary video editing approaches either require extensive fine-tuning or rely on additional networks (such as ControlNet) for modeling multi-object appearances, yet they remain in their infancy, offering only coarse-grained MAE solutions. In this paper, we present MAKIMA, a tuning-free MAE framework built upon pretrained T2I models for open-domain video editing. Our approach preserves video structure and appearance information by incorporating attention maps and features from the inversion process during denoising. To facilitate precise editing of multiple attributes, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need · Masked autoencoder · Focus
