High-Fidelity Guided Image Synthesis with Latent Diffusion Models
Jaskirat Singh, Stephen Gould, Liang Zheng

TL;DR
This paper introduces a novel guided image synthesis method using latent diffusion models that improves detail and user control over semantics, outperforming previous approaches in user satisfaction.
Contribution
It proposes a constrained optimization framework for image synthesis with a single diffusion pass and cross-attention based semantic control without additional training.
Findings
Outperforms previous state-of-the-art by over 85% in user satisfaction.
Addresses domain shift problem in controllable image synthesis.
Enables semantic control via cross-attention without retraining.
Abstract
Controllable image synthesis with user scribbles has gained huge public interest with the recent advent of text-conditioned latent diffusion models. The user scribbles control the color composition while the text prompt provides control over the overall image semantics. However, we note that prior works in this direction suffer from an intrinsic domain shift problem, wherein the generated outputs often lack details and resemble simplistic representations of the target domain. In this paper, we propose a novel guided image synthesis framework, which addresses this problem by modeling the output image as the solution of a constrained optimization problem. We show that while computing an exact solution to the optimization is infeasible, an approximation of the same can be achieved while just requiring a single pass of the reverse diffusion process. Additionally, we show that by simply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques
MethodsDiffusion
