DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Yiheng Liu; Liao Qu; Huichao Zhang; Xu Wang; Yi Jiang; Yiming Gao; Hu Ye; Xian Li; Shuai Wang; Daniel K. Du; Fangmin Chen; Zehuan Yuan; Xinglong Wu

arXiv:2505.21473·cs.CV·November 12, 2025

DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Yiheng Liu, Liao Qu, Huichao Zhang, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Xian Li, Shuai Wang, Daniel K. Du, Fangmin Chen, Zehuan Yuan, Xinglong Wu

PDF

Open Access 1 Repo

TL;DR

DetailFlow introduces a novel coarse-to-fine 1D autoregressive image generation approach that models images through next-detail prediction, achieving high quality and efficiency with fewer tokens and faster inference.

Contribution

The paper proposes a new resolution-aware token sequence and parallel inference mechanism, significantly reducing tokens and accelerating image generation in autoregressive models.

Findings

01

Achieves 2.96 FID on ImageNet 256x256 with 128 tokens

02

Runs nearly 2x faster than previous methods like VAR and FlexVAR

03

Outperforms prior models in quality with fewer tokens

Abstract

This paper presents DetailFlow, a coarse-to-fine 1D autoregressive (AR) image generation method that models images through a novel next-detail prediction strategy. By learning a resolution-aware token sequence supervised with progressively degraded images, DetailFlow enables the generation process to start from the global structure and incrementally refine details. This coarse-to-fine 1D token sequence aligns well with the autoregressive inference mechanism, providing a more natural and efficient way for the AR model to generate complex visual content. Our compact 1D AR model achieves high-quality image synthesis with significantly fewer tokens than previous approaches, i.e. VAR/VQGAN. We further propose a parallel inference mechanism with self-correction that accelerates generation speed by approximately 8x while reducing accumulation sampling error inherent in teacher-forcing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

byteflow-ai/detailflow
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings