Designing a Better Asymmetric VQGAN for StableDiffusion
Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, and Yinpeng Chen, Lu Yuan, Gang Hua

TL;DR
This paper introduces an asymmetric VQGAN architecture that enhances image inpainting and editing in StableDiffusion by reducing information loss and artifacts, with minimal retraining and computational overhead.
Contribution
It proposes a novel asymmetric VQGAN design with a heavier decoder and task-specific priors, improving image quality in inpainting and editing tasks without altering the original encoder.
Findings
Significant improvement in inpainting quality.
Enhanced local editing performance.
Maintains original text-to-image capabilities.
Abstract
StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Advanced Vision and Imaging
MethodsDiffusion · Inpainting
