Spherical Leech Quantization for Visual Tokenization and Generation
Yue Zhao, Hanwen Jiang, Zhenlin Xu, Chutong Yang, Ehsan Adeli, Philipp Kr\"ahenb\"uhl

TL;DR
This paper introduces Spherical Leech Quantization, a novel lattice-based method for visual tokenization and compression that improves reconstruction quality and efficiency over prior approaches, benefiting image generation tasks.
Contribution
It presents a unified lattice coding framework for non-parametric quantization and demonstrates the effectiveness of Leech lattice-based quantization in image tasks.
Findings
Outperforms prior art in image reconstruction quality
Achieves better compression with fewer bits
Enhances auto-regressive image generation results
Abstract
Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a unified formulation of different non-parametric quantization methods through the lens of lattice coding. The geometry of lattice codes explains the necessity of auxiliary loss terms when training auto-encoders with certain existing lookup-free quantization variants such as BSQ. As a step forward, we explore a few possible candidates, including random lattices, generalized Fibonacci lattices, and densest sphere packing lattices. Among all, we find the Leech lattice-based quantization method, which is dubbed as Spherical Leech Quantization (-SQ), leads to both a simplified training recipe and an improved reconstruction-compression tradeoff thanks to its high symmetry and even distribution on the hypersphere. In image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
