LINA: Linear Autoregressive Image Generative Models with Continuous Tokens

Jiahao Wang; Ting Pan; Haoge Deng; Dongchen Han; Taiqiang Wu; Xinlong Wang; Ping Luo

arXiv:2601.22630·cs.CV·February 2, 2026

LINA: Linear Autoregressive Image Generative Models with Continuous Tokens

Jiahao Wang, Ting Pan, Haoge Deng, Dongchen Han, Taiqiang Wu, Xinlong Wang, Ping Luo

PDF

Open Access

TL;DR

LINA introduces a compute-efficient linear attention-based model for text-to-image synthesis, achieving high-quality 1024x1024 images with significantly reduced computational costs and competitive benchmark performance.

Contribution

The paper designs and evaluates a novel linear attention mechanism with gating and convolutional augmentations, enabling a simple, efficient, and high-fidelity T2I model called LINA.

Findings

01

Division-based normalization scales better for generative transformers.

02

Convolutional locality modeling improves autoregressive image generation.

03

LINA reduces FLOPs by 61% compared to softmax attention and achieves competitive benchmark results.

Abstract

Autoregressive models with continuous tokens form a promising paradigm for visual generation, especially for text-to-image (T2I) synthesis, but they suffer from high computational cost. We study how to design compute-efficient linear attention within this framework. Specifically, we conduct a systematic empirical analysis of scaling behavior with respect to parameter counts under different design choices, focusing on (1) normalization paradigms in linear attention (division-based vs. subtraction-based) and (2) depthwise convolution for locality augmentation. Our results show that although subtraction-based normalization is effective for image classification, division-based normalization scales better for linear generative transformers. In addition, incorporating convolution for locality modeling plays a crucial role in autoregressive generation, consistent with findings in diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis