Designing a Better Asymmetric VQGAN for StableDiffusion

Zixin Zhu; Xuelu Feng; Dongdong Chen; Jianmin Bao; Le Wang; and Yinpeng Chen; Lu Yuan; Gang Hua

arXiv:2306.04632·cs.CV·June 8, 2023·5 cites

Designing a Better Asymmetric VQGAN for StableDiffusion

Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, and Yinpeng Chen, Lu Yuan, Gang Hua

PDF

Open Access 2 Repos 4 Models

TL;DR

This paper introduces an asymmetric VQGAN architecture that enhances image inpainting and editing in StableDiffusion by reducing information loss and artifacts, with minimal retraining and computational overhead.

Contribution

It proposes a novel asymmetric VQGAN design with a heavier decoder and task-specific priors, improving image quality in inpainting and editing tasks without altering the original encoder.

Findings

01

Significant improvement in inpainting quality.

02

Enhanced local editing performance.

03

Maintains original text-to-image capabilities.

Abstract

StableDiffusion is a revolutionary text-to-image generator that is causing a stir in the world of image generation and editing. Unlike traditional methods that learn a diffusion model in pixel space, StableDiffusion learns a diffusion model in the latent space via a VQGAN, ensuring both efficiency and quality. It not only supports image generation tasks, but also enables image editing for real images, such as image inpainting and local editing. However, we have observed that the vanilla VQGAN used in StableDiffusion leads to significant information loss, causing distortion artifacts even in non-edited image regions. To this end, we propose a new asymmetric VQGAN with two simple designs. Firstly, in addition to the input from the encoder, the decoder contains a conditional branch that incorporates information from task-specific priors, such as the unmasked image region in inpainting.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Advanced Vision and Imaging

MethodsDiffusion · Inpainting