FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Siyu Jiao, Gengwei Zhang, Yinlong Qian, Jiancheng Huang, Yao Zhao, Humphrey Shi, Lin Ma, Yunchao Wei, Zequn Jie

TL;DR
FlexVAR introduces a novel flexible autoregressive image generation paradigm that enables high-quality, resolution-agnostic, and task-flexible image synthesis without residual prediction, outperforming existing models on benchmark datasets.
Contribution
It proposes FlexVAR, a new autoregressive modeling approach that removes the residual prediction paradigm, allowing for flexible, high-resolution, and multi-task image generation from low-resolution training.
Findings
Outperforms VAR on ImageNet 256x256 with 1.0B parameters.
Achieves state-of-the-art FID scores in zero-shot transfer.
Supports various image-to-image tasks and resolutions.
Abstract
This work challenges the residual prediction paradigm in visual autoregressive modeling and presents FlexVAR, a new Flexible Visual AutoRegressive image generation paradigm. FlexVAR facilitates autoregressive learning with ground-truth prediction, enabling each step to independently produce plausible images. This simple, intuitive approach swiftly learns visual distributions and makes the generation process more flexible and adaptable. Trained solely on low-resolution images ( 256px), FlexVAR can: (1) Generate images of various resolutions and aspect ratios, even exceeding the resolution of the training images. (2) Support various image-to-image tasks, including image refinement, in/out-painting, and image expansion. (3) Adapt to various autoregressive steps, allowing for faster inference with fewer steps or enhancing image quality with more steps. Our 1.0B model outperforms its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Image Processing Techniques
MethodsDiffusion
