Generative Photomontage
Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu

TL;DR
This paper introduces a novel framework called Generative Photomontage that allows users to create customized images by selecting and compositing parts from multiple generated images, improving control and quality.
Contribution
It presents a new technique for segmenting and blending regions from generated images based on user input, enhancing image customization and correction capabilities.
Findings
Outperforms existing image blending methods
Enables fixing artifacts and shape errors
Improves prompt alignment in generated images
Abstract
Text-to-image models are powerful tools for image creation. However, the generation process is akin to a dice roll and makes it difficult to achieve a single image that captures everything a user wants. In this paper, we propose a framework for creating the desired image by compositing it from various parts of generated images, in essence forming a Generative Photomontage. Given a stack of images generated by ControlNet using the same input condition and different seeds, we let users select desired parts from the generated results using a brush stroke interface. We introduce a novel technique that takes in the user's brush strokes, segments the generated images using a graph-based optimization in diffusion feature space, and then composites the segmented regions via a new feature-space blending method. Our method faithfully preserves the user-selected regions while compositing them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage
MethodsDiffusion
