TL;DR
VARestorer introduces a one-step distillation framework transforming pre-trained VAR models into efficient super-resolution models, achieving state-of-the-art results and faster inference by eliminating iterative refinement.
Contribution
It proposes a novel distillation method with pyramid image conditioning and parameter-efficient fine-tuning, significantly improving efficiency and performance in real-world image super-resolution.
Findings
Achieves 72.32 MUSIQ and 0.7669 CLIPIQA scores on DIV2K.
Reduces inference time by 10 times compared to traditional VAR methods.
Maintains model expressiveness while fine-tuning only 1.2% of parameters.
Abstract
Recent advancements in visual autoregressive models (VAR) have demonstrated their effectiveness in image generation, highlighting their potential for real-world image super-resolution (Real-ISR). However, adapting VAR for ISR presents critical challenges. The next-scale prediction mechanism, constrained by causal attention, fails to fully exploit global low-quality (LQ) context, resulting in blurry and inconsistent high-quality (HQ) outputs. Additionally, error accumulation in the iterative prediction severely degrades coherence in ISR task. To address these issues, we propose VARestorer, a simple yet effective distillation framework that transforms a pre-trained text-to-image VAR model into a one-step ISR model. By leveraging distribution matching, our method eliminates the need for iterative refinement, significantly reducing error propagation and inference time. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
