Semantic-Aware Prefix Learning for Token-Efficient Image Generation
Qingfeng Li, Haoxian Zhang, Xu He, Songlin Tang, Zhixue Fang, Xiaoqiang Liu, Pengfei Wan Guoqi Li

TL;DR
This paper introduces SMAP, a semantic-aware prefix tokenizer for image generation that enforces semantic grounding in latent representations, leading to improved reconstruction and generation quality with fewer tokens.
Contribution
SMAP is a novel tokenization method that incorporates class-level semantics into the tokenization process, making semantics essential for representation learning.
Findings
SMAP improves reconstruction quality across tokenization settings.
Semantically grounded latent space enhances downstream generation performance.
SMAP achieves strong results with compact token budgets.
Abstract
Visual tokenizers play a central role in latent image generation by bridging high-dimensional images and tractable generative modeling. However, most existing tokenizers are still trained with reconstruction-dominated objectives, which often yield latent representations that are only weakly grounded in high-level semantics. Recent approaches improve semantic alignment, but typically treat semantic signals as auxiliary regularization rather than making them functionally necessary for representation learning. We propose SMAP, a SeMantic-Aware Prefix tokenizer that injects class-level semantic conditions into a query-based 1D tokenization framework. To make semantics indispensable during training, SMAP introduces a tail token dropping strategy, which forces semantic conditions and early latent prefixes to bear increasing responsibility under progressively reduced token budgets. To verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
