FVAR: Visual Autoregressive Modeling via Next Focus Prediction

Xiaofan Li; Chenming Wu; Yanpeng Sun; Jiaming Zhou; Delin Qu; Yansong Qu; Weihao Bo; Haibao Yu; Dingkang Liang

arXiv:2511.18838·cs.CV·November 25, 2025

FVAR: Visual Autoregressive Modeling via Next Focus Prediction

Xiaofan Li, Chenming Wu, Yanpeng Sun, Jiaming Zhou, Delin Qu, Yansong Qu, Weihao Bo, Haibao Yu, Dingkang Liang

PDF

Open Access

TL;DR

FVAR introduces a novel focus-based multi-scale autoregressive model that reduces aliasing artifacts and enhances detail in visual generation by mimicking camera focusing, using physics-based defocus kernels and residual learning.

Contribution

It proposes a new next-focus prediction paradigm with a refocusing pyramid and residual learning, significantly improving detail and aliasing handling over traditional downsampling methods.

Findings

01

Reduces aliasing artifacts in generated images.

02

Improves preservation of fine details and text readability.

03

Achieves superior performance on ImageNet benchmarks.

Abstract

Visual autoregressive models achieve remarkable generation quality through next-scale predictions across multi-scale token pyramids. However, the conventional method uses uniform scale downsampling to build these pyramids, leading to aliasing artifacts that compromise fine details and introduce unwanted jaggies and moir\'e patterns. To tackle this issue, we present \textbf{FVAR}, which reframes the paradigm from \emph{next-scale prediction} to \emph{next-focus prediction}, mimicking the natural process of camera focusing from blur to clarity. Our approach introduces three key innovations: \textbf{1) Next-Focus Prediction Paradigm} that transforms multi-scale autoregression by progressively reducing blur rather than simply downsampling; \textbf{2) Progressive Refocusing Pyramid Construction} that uses physics-consistent defocus kernels to build clean, alias-free multi-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Generative Adversarial Networks and Image Synthesis