FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Siyu Jiao; Gengwei Zhang; Yinlong Qian; Jiancheng Huang; Yao Zhao; Humphrey Shi; Lin Ma; Yunchao Wei; Zequn Jie

arXiv:2502.20313·cs.CV·January 13, 2026

FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction

Siyu Jiao, Gengwei Zhang, Yinlong Qian, Jiancheng Huang, Yao Zhao, Humphrey Shi, Lin Ma, Yunchao Wei, Zequn Jie

PDF

Open Access 1 Repo

TL;DR

FlexVAR introduces a novel flexible autoregressive image generation paradigm that enables high-quality, resolution-agnostic, and task-flexible image synthesis without residual prediction, outperforming existing models on benchmark datasets.

Contribution

It proposes FlexVAR, a new autoregressive modeling approach that removes the residual prediction paradigm, allowing for flexible, high-resolution, and multi-task image generation from low-resolution training.

Findings

01

Outperforms VAR on ImageNet 256x256 with 1.0B parameters.

02

Achieves state-of-the-art FID scores in zero-shot transfer.

03

Supports various image-to-image tasks and resolutions.

Abstract

This work challenges the residual prediction paradigm in visual autoregressive modeling and presents FlexVAR, a new Flexible Visual AutoRegressive image generation paradigm. FlexVAR facilitates autoregressive learning with ground-truth prediction, enabling each step to independently produce plausible images. This simple, intuitive approach swiftly learns visual distributions and makes the generation process more flexible and adaptable. Trained solely on low-resolution images ( $\leq$ 256px), FlexVAR can: (1) Generate images of various resolutions and aspect ratios, even exceeding the resolution of the training images. (2) Support various image-to-image tasks, including image refinement, in/out-painting, and image expansion. (3) Adapt to various autoregressive steps, allowing for faster inference with fewer steps or enhancing image quality with more steps. Our 1.0B model outperforms its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiaosiyu1999/FlexVAR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Image Processing Techniques

MethodsDiffusion