On Computational Limits and Provably Efficient Criteria of Visual Autoregressive Models: A Fine-Grained Complexity Analysis
Yekun Ke, Xiaoyu Li, Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song

TL;DR
This paper analyzes the computational complexity of Visual Autoregressive models, establishing theoretical limits on their efficiency and proposing methods for more scalable image generation within these constraints.
Contribution
It provides the first fine-grained complexity analysis of VAR models, proving that sub-quadratic algorithms are unlikely under SETH and offering practical low-rank approximation strategies.
Findings
Sub-quadratic time complexity for VAR models is unlikely under SETH.
Efficient low-rank approximation constructions are compatible with theoretical criteria.
The work initiates a theoretical study of VAR model computational limits.
Abstract
Recently, Visual Autoregressive () Models introduced a groundbreaking advancement in the field of image generation, offering a scalable approach through a coarse-to-fine ``next-scale prediction'' paradigm. Suppose that represents the height and width of the last VQ code map generated by models, the state-of-the-art algorithm in [Tian, Jiang, Yuan, Peng and Wang, NeurIPS 2024] takes time, which is computationally inefficient. In this work, we analyze the computational limits and efficiency criteria of Models through a fine-grained complexity lens. Our key contribution is identifying the conditions under which computations can achieve sub-quadratic time complexity. We have proved that assuming the Strong Exponential Time Hypothesis () from fine-grained complexity theory, a sub-quartic time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsALIGN
