LINA: Linear Autoregressive Image Generative Models with Continuous Tokens
Jiahao Wang, Ting Pan, Haoge Deng, Dongchen Han, Taiqiang Wu, Xinlong Wang, Ping Luo

TL;DR
LINA introduces a compute-efficient linear attention-based model for text-to-image synthesis, achieving high-quality 1024x1024 images with significantly reduced computational costs and competitive benchmark performance.
Contribution
The paper designs and evaluates a novel linear attention mechanism with gating and convolutional augmentations, enabling a simple, efficient, and high-fidelity T2I model called LINA.
Findings
Division-based normalization scales better for generative transformers.
Convolutional locality modeling improves autoregressive image generation.
LINA reduces FLOPs by 61% compared to softmax attention and achieves competitive benchmark results.
Abstract
Autoregressive models with continuous tokens form a promising paradigm for visual generation, especially for text-to-image (T2I) synthesis, but they suffer from high computational cost. We study how to design compute-efficient linear attention within this framework. Specifically, we conduct a systematic empirical analysis of scaling behavior with respect to parameter counts under different design choices, focusing on (1) normalization paradigms in linear attention (division-based vs. subtraction-based) and (2) depthwise convolution for locality augmentation. Our results show that although subtraction-based normalization is effective for image classification, division-based normalization scales better for linear generative transformers. In addition, incorporating convolution for locality modeling plays a crucial role in autoregressive generation, consistent with findings in diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
