Shifting the Breaking Point of Flow Matching for Multi-Instance Editing
Carmine Zaccagnino, Fabio Quattrini, Enis Simsar, Marta Tintor\'e Gazulla, Rita Cucchiara, Alessio Tonioni, Silvia Cascianelli

TL;DR
This paper introduces Instance-Disentangled Attention to improve multi-instance editing in flow matching models, enabling independent, localized edits without interference, and demonstrates its effectiveness on natural images and infographics.
Contribution
It proposes a novel attention mechanism that disentangles instance-specific edits in flow-based models, addressing a key limitation of existing global conditioning methods.
Findings
Enhanced edit disentanglement and locality.
Preserved global coherence in multi-instance editing.
Effective on natural images and infographics.
Abstract
Flow matching models have recently emerged as an efficient alternative to diffusion, especially for text-guided image generation and editing, offering faster inference through continuous-time dynamics. However, existing flow-based editors predominantly support global or single-instruction edits and struggle with multi-instance scenarios, where multiple parts of a reference input must be edited independently without semantic interference. We identify this limitation as a consequence of globally conditioned velocity fields and joint attention mechanisms, which entangle concurrent edits. To address this issue, we introduce Instance-Disentangled Attention, a mechanism that partitions joint attention operations, enforcing binding between instance-specific textual instructions and spatial regions during velocity field estimation. We evaluate our approach on both natural image editing and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship · Computer Graphics and Visualization Techniques
