DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction
Yiheng Liu, Liao Qu, Huichao Zhang, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Xian Li, Shuai Wang, Daniel K. Du, Fangmin Chen, Zehuan Yuan, Xinglong Wu

TL;DR
DetailFlow introduces a novel coarse-to-fine 1D autoregressive image generation approach that models images through next-detail prediction, achieving high quality and efficiency with fewer tokens and faster inference.
Contribution
The paper proposes a new resolution-aware token sequence and parallel inference mechanism, significantly reducing tokens and accelerating image generation in autoregressive models.
Findings
Achieves 2.96 FID on ImageNet 256x256 with 128 tokens
Runs nearly 2x faster than previous methods like VAR and FlexVAR
Outperforms prior models in quality with fewer tokens
Abstract
This paper presents DetailFlow, a coarse-to-fine 1D autoregressive (AR) image generation method that models images through a novel next-detail prediction strategy. By learning a resolution-aware token sequence supervised with progressively degraded images, DetailFlow enables the generation process to start from the global structure and incrementally refine details. This coarse-to-fine 1D token sequence aligns well with the autoregressive inference mechanism, providing a more natural and efficient way for the AR model to generate complex visual content. Our compact 1D AR model achieves high-quality image synthesis with significantly fewer tokens than previous approaches, i.e. VAR/VQGAN. We further propose a parallel inference mechanism with self-correction that accelerates generation speed by approximately 8x while reducing accumulation sampling error inherent in teacher-forcing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
