SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hao Chen, Ze Wang, Xiang Li, Ximeng Sun, Fangyi Chen, Jiang Liu,, Jindong Wang, Bhiksha Raj, Zicheng Liu, Emad Barsoum

TL;DR
SoftVQ-VAE introduces a continuous image tokenizer that enhances representation capacity and significantly accelerates image generation, achieving high-quality results with fewer tokens and improved efficiency in generative models.
Contribution
It presents SoftVQ-VAE, a novel continuous tokenizer leveraging soft categorical posteriors, enabling efficient, high-capacity image representation and faster generation in Transformer-based models.
Findings
Achieves up to 18x faster inference for 256x256 images.
Reduces training iterations by 2.3x while maintaining quality.
Maintains competitive FID scores with fewer tokens.
Abstract
Efficient image tokenization with high compression ratios remains a critical challenge for training generative models. We present SoftVQ-VAE, a continuous image tokenizer that leverages soft categorical posteriors to aggregate multiple codewords into each latent token, substantially increasing the representation capacity of the latent space. When applied to Transformer-based architectures, our approach compresses 256x256 and 512x512 images using as few as 32 or 64 1-dimensional tokens. Not only does SoftVQ-VAE show consistent and high-quality reconstruction, more importantly, it also achieves state-of-the-art and significantly faster image generation results across different denoising-based generative models. Remarkably, SoftVQ-VAE improves inference throughput by up to 18x for generating 256x256 images and 55x for 512x512 images while achieving competitive FID scores of 1.78 and 2.21…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Digital Filter Design and Implementation
