FVAR: Visual Autoregressive Modeling via Next Focus Prediction
Xiaofan Li, Chenming Wu, Yanpeng Sun, Jiaming Zhou, Delin Qu, Yansong Qu, Weihao Bo, Haibao Yu, Dingkang Liang

TL;DR
FVAR introduces a novel focus-based multi-scale autoregressive model that reduces aliasing artifacts and enhances detail in visual generation by mimicking camera focusing, using physics-based defocus kernels and residual learning.
Contribution
It proposes a new next-focus prediction paradigm with a refocusing pyramid and residual learning, significantly improving detail and aliasing handling over traditional downsampling methods.
Findings
Reduces aliasing artifacts in generated images.
Improves preservation of fine details and text readability.
Achieves superior performance on ImageNet benchmarks.
Abstract
Visual autoregressive models achieve remarkable generation quality through next-scale predictions across multi-scale token pyramids. However, the conventional method uses uniform scale downsampling to build these pyramids, leading to aliasing artifacts that compromise fine details and introduce unwanted jaggies and moir\'e patterns. To tackle this issue, we present \textbf{FVAR}, which reframes the paradigm from \emph{next-scale prediction} to \emph{next-focus prediction}, mimicking the natural process of camera focusing from blur to clarity. Our approach introduces three key innovations: \textbf{1) Next-Focus Prediction Paradigm} that transforms multi-scale autoregression by progressively reducing blur rather than simply downsampling; \textbf{2) Progressive Refocusing Pyramid Construction} that uses physics-consistent defocus kernels to build clean, alias-free multi-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Generative Adversarial Networks and Image Synthesis
