LGQ: Learning Discretization Geometry for Scalable and Stable Image Tokenization
Idil Bilge Altun, Mert Onur Cakiroglu, Elham Buxton, Mehmet Dalkilic, Hasan Kurban

TL;DR
LGQ introduces a learnable discretization geometry for image tokenization that improves stability, utilization, and efficiency in visual generation tasks by end-to-end training with differentiable soft assignments.
Contribution
It proposes LGQ, a novel end-to-end learnable geometric quantizer that replaces fixed discretization with a differentiable approach, enhancing stability and capacity utilization.
Findings
LGQ achieves 11.88% better rFID than FSQ at 16K codebook size.
LGQ uses nearly half the active codes compared to FSQ and SimVQ.
LGQ maintains high fidelity with fewer active entries, demonstrating efficiency.
Abstract
Discrete image tokenization is a key bottleneck for scalable visual generation: a tokenizer must remain compact for efficient latent-space priors while preserving semantic structure and using discrete capacity effectively. Existing quantizers face a trade-off: vector-quantized tokenizers learn flexible geometries but often suffer from biased straight-through optimization, codebook under-utilization, and representation collapse at large vocabularies. Structured scalar or implicit tokenizers ensure stable, near-complete utilization by design, yet rely on fixed discretization geometries that may allocate capacity inefficiently under heterogeneous latent statistics. We introduce Learnable Geometric Quantization (LGQ), a discrete image tokenizer that learns discretization geometry end-to-end. LGQ replaces hard nearest-neighbor lookup with temperature-controlled soft assignments, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Digital Media Forensic Detection
