Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning
Maomao Li, Lijian Lin, Yunfei Liu, Ye Zhu, Yu Li

TL;DR
Qffusion is a novel framework that enables controllable portrait video editing by leveraging a quadrant-grid attention scheme and stable diffusion, allowing stable, arbitrary-length video editing with minimal additional training.
Contribution
The paper introduces Qffusion, a new latent re-arrangement and attention-based framework for portrait video editing that does not require extra networks or complex training.
Findings
Outperforms state-of-the-art portrait video editing methods.
Achieves stable editing for arbitrary-length videos.
Operates with only modifications to input format of Stable Diffusion.
Abstract
This paper presents Qffusion, a dual-frame-guided framework for portrait video editing. Specifically, we consider a design principle of ``animation for editing'', and train Qffusion as a general animation framework from two still reference images while we can use it for portrait video editing easily by applying modified start and end frames as references during inference. Leveraging the powerful generative power of Stable Diffusion, we propose a Quadrant-grid Arrangement (QGA) scheme for latent re-arrangement, which arranges the latent codes of two reference images and that of four facial conditions into a four-grid fashion, separately. Then, we fuse features of these two modalities and use self-attention for both appearance and temporal learning, where representations at different times are jointly modeled under QGA. Our Qffusion can achieve stable video editing without additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
