TL;DR
FlashAR introduces a post-training framework that transforms pre-trained autoregressive image models into highly parallel generators, significantly accelerating inference with minimal additional training.
Contribution
It proposes a novel lightweight adaptation method that retains the original model's training objective while enabling efficient parallel decoding.
Findings
Achieves up to 22.9x speedup in image generation.
Requires only 0.05% of original training data for adaptation.
Maintains high-quality image generation performance.
Abstract
Large-scale autoregressive models have demonstrated remarkable capabilities in image generation. However, their sequential raster-scan decoding relies on strictly next-token prediction, making inference prohibitively expensive. Existing acceleration methods typically either introduce entirely new generation paradigms that necessitate costly pre-training from scratch, or enable parallel generation at the expense of a training-inference gap or altered prediction objectives. In this paper, we introduce FlashAR, a lightweight post-training adaptation framework that efficiently adapts a pre-trained raster-scan autoregressive model into a highly parallel generator based on two-way next-token prediction. Our key insight is that effective adaptation should minimize modifications to the pre-trained model's original training objective to preserve its learned prior. Accordingly, we retain the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
