AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution
Cencen Liu (1), Dongyang Zhang (1, 2), Wen Yin (1), Jielei Wang (1, 2), Tianyu Li (1), Ji Guo (1), Wenbo Jiang (1), Guoqing Wang (1), Guoming Lu (1, 2) ((1) University of Electronic Science, Technology of China, (2) Ubiquitous Intelligence

TL;DR
AlignVAR introduces a globally consistent autoregressive framework for image super-resolution, addressing local attention bias and residual supervision issues to improve structural coherence and efficiency.
Contribution
It proposes two novel components, SCA and HCC, to enhance global consistency and stability in visual autoregressive super-resolution models.
Findings
Improves structural coherence and perceptual fidelity.
Achieves over 10x faster inference than diffusion models.
Uses nearly 50% fewer parameters than leading approaches.
Abstract
Visual autoregressive (VAR) models have recently emerged as a promising alternative for image generation, offering stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction. This encourages the exploration of VAR for image super-resolution (ISR), yet its application remains underexplored and faces two critical challenges: locality-biased attention, which fragments spatial structures, and residual-only supervision, which accumulates errors across scales, severely compromises global consistency of reconstructed images. To address these issues, we propose AlignVAR, a globally consistent visual autoregressive framework tailored for ISR, featuring two key components: (1) Spatial Consistency Autoregression (SCA), which applies an adaptive mask to reweight attention toward structurally correlated regions, thereby mitigating excessive locality and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
