Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch,, Daniel Cohen-Or

TL;DR
This paper introduces a prompt-to-prompt image editing method that leverages cross-attention layers in text-conditioned models to enable intuitive, mask-free, and precise edits based solely on textual modifications, maintaining image fidelity.
Contribution
The paper reveals that cross-attention layers are key to controlling spatial layout in text-to-image models and develops a prompt-based editing framework without masks.
Findings
Effective localized and global editing through text modifications
High-quality synthesis with fidelity to edited prompts
Versatile application across diverse images and prompts
Abstract
Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such text-based synthesis methods are particularly appealing to humans who are used to verbally describe their intent. Therefore, it is only natural to extend the text-driven image synthesis to text-driven image editing. Editing is challenging for these generative models, since an innate property of an editing technique is to preserve most of the original image, while in the text-based models, even a small modification of the text prompt often leads to a completely different outcome. State-of-the-art methods mitigate this by requiring the users to provide a spatial mask to localize the edit, hence, ignoring the original structure and content within the masked region. In this paper, we pursue an intuitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Google's AI: Stable Diffusion On Steroids! 💪· youtube
Taxonomy
TopicsModular Robots and Swarm Intelligence · Advanced Materials and Mechanics · Interactive and Immersive Displays
