Towards Accurate Image Coding: Improved Autoregressive Image Generation   with Dynamic Vector Quantization

Mengqi Huang; Zhendong Mao; Zhuowei Chen; Yongdong Zhang

arXiv:2305.11718·cs.CV·May 22, 2023·1 cites

Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization

Mengqi Huang, Zhendong Mao, Zhuowei Chen, Yongdong Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dynamic vector quantization framework for autoregressive image generation, improving accuracy and efficiency by encoding image regions with variable-length codes and generating images from coarse to fine details.

Contribution

It proposes a novel DQ-VAE for variable-length encoding and a DQ-Transformer for coarse-to-fine autoregressive image generation, addressing limitations of fixed-length coding.

Findings

01

Outperforms existing models in quality and speed

02

Effective in various image generation tasks

03

Reduces redundancy and improves detail representation

Abstract

Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm that first learns a codebook to encode images as discrete codes, and then completes generation based on the learned codebook. However, they encode fixed-size image regions into fixed-length codes and ignore their naturally different information densities, which results in insufficiency in important regions and redundancy in unimportant ones, and finally degrades the generation quality and speed. Moreover, the fixed-length coding leads to an unnatural raster-scan autoregressive generation. To address the problem, we propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based on their information densities for an accurate and compact code representation. (2) DQ-Transformer which thereby generates images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crossmodalgroup/dynamicvectorquantization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Advanced Vision and Imaging