Structured State-Space Regularization for Generation-Friendly Image Tokenization
Jinsung Lee, Jaemin Oh, Namhun Kim, Dongwon Kim, Byung-Jun Yoon, Suha Kwak

TL;DR
This paper introduces a spectral regularization method for image tokenizers that enhances their generative capabilities by structuring latent space frequency components, with minimal impact on reconstruction quality.
Contribution
It proposes a novel regularizer based on state-space models to induce spectral structure in image tokenization latent spaces, improving generation performance.
Findings
Regularizer improves generative performance of image tokenizers.
Minimal loss in reconstruction fidelity.
Spectral organization enhances downstream generation quality.
Abstract
Image tokenizers play a central role in modern generative models, where the structure of the latent space critically determines the downstream generation performance. A key but underexplored property of effective latent representations is spectral organization, the ability to encode information across frequency components. In this work, we introduce structured state-space regularization, a principled approach to inducing spectral structure in latent spaces. We derive a regularization objective by revisiting state-space models (SSMs) as systems mimicking a basis function's behavior. This perspective reveals that hidden states of SSMs are induced to capture the frequency components, resulting in a novel regularizer that enforces the latent space to capture spectral structure of images. Experiments demonstrate that our regularizer improves the generative performance of image tokenizers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
