Spectral Image Tokenizer
Carlos Esteves, Mohammed Suhail, Ameesh Makadia

TL;DR
This paper introduces a spectral image tokenizer based on wavelet transforms that improves autoregressive image modeling by enabling multi-resolution processing, better conditioning, and partial decoding, enhancing image generation and editing capabilities.
Contribution
It proposes a novel image tokenizer using wavelet spectrum that allows resolution flexibility, improved conditioning, and partial decoding for autoregressive image models.
Findings
Enhanced image reconstruction quality.
Supports multi-resolution image generation.
Enables effective image upsampling and editing.
Abstract
Image tokenizers map images to sequences of discrete tokens, and are a crucial component of autoregressive transformer-based image generation. The tokens are typically associated with spatial locations in the input image, arranged in raster scan order, which is not ideal for autoregressive modeling. In this paper, we propose to tokenize the image spectrum instead, obtained from a discrete wavelet transform (DWT), such that the sequence of tokens represents the image in a coarse-to-fine fashion. Our tokenizer brings several advantages: 1) it leverages that natural images are more compressible at high frequencies, 2) it can take and reconstruct images of different resolutions without retraining, 3) it improves the conditioning for next-token prediction -- instead of conditioning on a partial line-by-line reconstruction of the image, it takes a coarse reconstruction of the full image, 4)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
