DPAR: Dynamic Patchification for Efficient Autoregressive Visual Generation
Divyansh Srivastava, Akshay Mehra, Pranav Maneriker, Debopam Sanyal, Vishnu Raj, Vijay Kamarshi, Fan Du, Joshua Kimball

TL;DR
DPAR introduces a dynamic patchification method for autoregressive image generation, reducing computational costs and improving image quality by adaptively merging tokens into larger patches based on information content.
Contribution
It is the first to use next-token prediction entropy for adaptive token merging, enabling efficient and scalable autoregressive image generation with minimal architectural changes.
Findings
Reduces token count by up to 2.06x at higher resolutions.
Achieves up to 40% reduction in training FLOPs.
Improves FID scores by up to 27.1% over baselines.
Abstract
Decoder-only autoregressive image generation typically relies on fixed-length tokenization schemes whose token counts grow quadratically with resolution, substantially increasing the computational and memory demands of attention. We present DPAR, a novel decoder-only autoregressive model that dynamically aggregates image tokens into a variable number of patches for efficient image generation. Our work is the first to demonstrate that next-token prediction entropy from a lightweight and unsupervised autoregressive model provides a reliable criterion for merging tokens into larger patches based on information content. DPAR makes minimal modifications to the standard decoder architecture, ensuring compatibility with multimodal generation frameworks and allocating more compute to generation of high-information image regions. Further, we demonstrate that training with dynamically sized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Handwritten Text Recognition Techniques
