Wavelets Are All You Need for Autoregressive Image Generation
Wael Mattar, Idan Levy, Nir Sharon, Shai Dekel

TL;DR
This paper introduces a novel autoregressive image generation method using wavelet-based tokenization combined with a specialized transformer architecture, enabling efficient modeling of image details across multiple resolutions.
Contribution
The work presents a new wavelet-based tokenization scheme and a tailored transformer model for autoregressive image generation, improving detail representation and statistical correlation modeling.
Findings
Effective wavelet tokenization captures multi-resolution image details.
Transformer architecture learns significant statistical correlations.
Experimental results demonstrate successful conditioned image generation.
Abstract
In this paper, we take a new approach to autoregressive image generation that is based on two main ingredients. The first is wavelet image coding, which allows to tokenize the visual details of an image from coarse to fine details by ordering the information starting with the most significant bits of the most significant wavelet coefficients. The second is a variant of a language transformer whose architecture is re-designed and optimized for token sequences in this 'wavelet language'. The transformer learns the significant statistical correlations within a token sequence, which are the manifestations of well-known correlations between the wavelet subbands at various resolutions. We show experimental results with conditioning on the generation process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · Image Retrieval and Classification Techniques · Medical Image Segmentation Techniques
