Rethinking the Objectives of Vector-Quantized Tokenizers for Image   Synthesis

Yuchao Gu; Xintao Wang; Yixiao Ge; Ying Shan; Xiaohu Qie; Mike Zheng; Shou

arXiv:2212.03185·cs.CV·March 10, 2023·1 cites

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

Yuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Xiaohu Qie, Mike Zheng, Shou

PDF

Open Access

TL;DR

This paper challenges the focus on reconstruction fidelity in VQ tokenizers for image synthesis, proposing a semantic compression approach that improves generative quality by balancing semantic and detail preservation.

Contribution

It introduces SeQ-GAN, a two-phase training method that enhances semantic compression and detail preservation, leading to superior image generation performance.

Findings

01

SeQ-GAN surpasses previous models in FID and IS metrics.

02

Semantic compression improves generative transformer capabilities.

03

Balancing semantic and detail objectives enhances image synthesis quality.

Abstract

Vector-Quantized (VQ-based) generative models usually consist of two basic components, i.e., VQ tokenizers and generative transformers. Prior research focuses on improving the reconstruction fidelity of VQ tokenizers but rarely examines how the improvement in reconstruction affects the generation ability of generative transformers. In this paper, we surprisingly find that improving the reconstruction fidelity of VQ tokenizers does not necessarily improve the generation. Instead, learning to compress semantic features within VQ tokenizers significantly improves generative transformers' ability to capture textures and structures. We thus highlight two competing objectives of VQ tokenizers for image synthesis: semantic compression and details preservation. Different from previous work that only pursues better details preservation, we propose Semantic-Quantized GAN (SeQ-GAN) with two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsDiffusion