TL;DR
This paper investigates how ordered token structures, especially coarse-to-fine sequences, improve the efficiency and scalability of test-time search in autoregressive generative models, particularly for image generation.
Contribution
It demonstrates that coarse-to-fine ordered tokens enable more effective test-time search and training-free generation, offering practical guidance for inference scalability.
Findings
Coarse-to-fine tokens improve test-time scaling in AR models.
Ordered tokens enable training-free, search-based image generation.
Classical search algorithms interact differently with token structures.
Abstract
Tokenization is a key component of autoregressive (AR) generative models, converting raw data into more manageable units for modeling. Commonly, tokens describe local information, such as regions of pixels in images or word pieces in text, and AR generation predicts these tokens in a fixed order. A worthwhile question is whether token structures affect the ability to steer the generation through test-time search, where multiple candidate generations are explored and evaluated by a verifier. Using image generation as our testbed, we hypothesize that recent 1D ordered tokenizers with coarse-to-fine structure can be more amenable to search than classical 2D grid structures. This is rooted in the fact that the intermediate states in coarse-to-fine sequences carry semantic meaning that verifiers can reliably evaluate, enabling effective steering during generation. Through controlled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
