Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient   Vision Transformers

Shiyue Cao; Yueqin Yin; Lianghua Huang; Yu Liu; Xin Zhao; Deli Zhao,; Kaiqi Huang

arXiv:2310.05400·cs.CV·October 10, 2023·1 cites

Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Shiyue Cao, Yueqin Yin, Lianghua Huang, Yu Liu, Xin Zhao, Deli Zhao,, Kaiqi Huang

PDF

Open Access

TL;DR

Efficient-VQGAN introduces a two-stage, attention-efficient framework for high-resolution image generation that combines local and global attention mechanisms, resulting in faster, higher-quality image synthesis.

Contribution

The paper proposes a novel two-stage framework with local attention-based quantization and combined attention mechanisms, improving efficiency and quality in high-resolution image generation.

Findings

01

Outperforms previous methods in image quality and resolution

02

Achieves faster generation speeds

03

Demonstrates superior reconstruction quality

Abstract

Vector-quantized image modeling has shown great potential in synthesizing high-quality images. However, generating high-resolution images remains a challenging task due to the quadratic computational overhead of the self-attention process. In this study, we seek to explore a more efficient two-stage framework for high-resolution image generation with improvements in the following three aspects. (1) Based on the observation that the first quantization stage has solid local property, we employ a local attention-based quantization model instead of the global attention mechanism used in previous methods, leading to better efficiency and reconstruction quality. (2) We emphasize the importance of multi-grained feature interaction during image generation and introduce an efficient attention mechanism that combines global attention (long-range semantic consistency within the whole image) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques