Universal Approximation of Visual Autoregressive Transformers

Yifang Chen; Xiaoyu Li; Yingyu Liang; Zhenmei Shi; Zhao Song

arXiv:2502.06167·cs.LG·February 11, 2025

Universal Approximation of Visual Autoregressive Transformers

Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

PDF

Open Access

TL;DR

This paper proves that simple Visual Autoregressive (VAR) transformers are universal approximators for image-to-image functions, outperforming previous methods and guiding future design of efficient image generation models.

Contribution

It establishes the universality of single-head VAR transformers with minimal layers, providing theoretical foundations and design principles for advanced image synthesis models.

Findings

01

VAR transformers outperform previous image synthesis methods

02

Single-layer VAR transformers are universal approximators

03

Flow-based autoregressive transformers share similar capabilities

Abstract

We investigate the fundamental limits of transformer-based foundation models, extending our analysis to include Visual Autoregressive (VAR) transformers. VAR represents a big step toward generating images using a novel, scalable, coarse-to-fine ``next-scale prediction'' framework. These models set a new quality bar, outperforming all previous methods, including Diffusion Transformers, while having state-of-the-art performance for image synthesis tasks. Our primary contributions establish that, for single-head VAR transformers with a single self-attention layer and single interpolation layer, the VAR Transformer is universal. From the statistical perspective, we prove that such simple VAR transformers are universal approximators for any image-to-image Lipschitz functions. Furthermore, we demonstrate that flow-based autoregressive transformers inherit similar approximation capabilities.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual perception and processing mechanisms · Color Science and Applications