TL;DR
This paper introduces a framework that applies classical search algorithms to control and improve diffusion models during inference, enhancing their performance and efficiency across various tasks.
Contribution
It presents a novel approach combining local and global search techniques for inference-time control in diffusion models, grounded in classical search principles.
Findings
Significant performance improvements in planning, reinforcement learning, and image generation.
Enhanced efficiency in inference through the proposed search framework.
Demonstrated theoretical and practical benefits of classical search in diffusion models.
Abstract
Classical search algorithms have long underpinned modern artificial intelligence. In this work, we tackle the challenge of inference-time control in diffusion models -- adapting generated outputs to meet diverse test-time objectives -- using principles from classical search. We propose a general framework that orchestrates local and global search to efficiently navigate the generative space. It employs a theoretically grounded local search via annealed Langevin MCMC and performs compute-efficient global exploration using breadth-first and depth-first tree search. We evaluate our approach on a range of challenging domains, including planning, offline reinforcement learning, and image generation. Across all tasks, we observe significant gains in both performance and efficiency. These results show that classical search provides a principled and practical foundation for inference-time…
Peer Reviews
Decision·ICLR 2026 Poster
- Principled, general framework unifying global tree search and local gradient-based MCMC; theory linking recurrence to Langevin MCMC with convergence insights - Algorithmic contributions: improved BFS design and novel adaptive DFS that scales compute by instance difficulty, substantially boosting quality per NFE - Strong empirical results and ablations across tasks; new Pareto frontier, competitive offline RL without retraining, effective policy distillation, and robustness via double-verifier
- The proposed BFS and DFS methods were shown to be effective across various scenarios. However, establishing a unified methodology with default hyperparameter settings (beyond TTS in RL tasks) would make the approach more practical for broader use. - No comparisons with more recent T2I diffusion models beyond SD1.5 and SDXL and recent baselines (e.g., [1]). - How stable is the proposed method against reward hacking without the double verifier? Since reward hacking is a critical issue, it should
- The various design choices for the BFS method are clearly presented and the studies help understand which components are helpful. - The threshold-based method is a useful modification for DFS that is practical, and experiments show it indeed scales up compute for harder problems as intended. - The theoretical result bridge that guided recurrence approximates annealed Langevin MCMC in the small-step limit. - The experiments span multiple domains including text-to-image generation, path generati
My main reservations concern the strength of the paper’s contribution and significance, the clarity of exposition, and a few statements that seem potentially misleading or inaccurate. ### Contribution The global search methods BFS/DFS proposed in the paper largely seem to be an alternate perspective or modifications to existing methods, rather than fundamentally new algorithms - The authors acknowledge that FK-steering/DAS are instantiations of BFS, but I do not see what new insights are prov
1. It explores various design choices in inference-time scaling of diffusion models across global and local search and proposes a unified framework that jointly scales both global and local search. 2. It proposes a method using depth-first search (DFS), which can adaptively allocate compute. 3. Decomposed ablation on global search (Section 5.2) and local search (Section 5.3.1) shows the effectiveness and necessity of both methods. 4. Experiment demonstrates that distilling TTS samples can be a p
1. Usage of Langevin MCMC prevents applying local search for non-differentiable rewards or naturally expanding the method to discrete diffusions. There are other MCMC alternatives, such as Metropolis-Hastings variants (e.g., predictor-corrector), but the choice of Langevin MCMC isn't justified sufficiently. 2. By reporting only a single reward value in image experiments, it's hard to tell whether there is severe reward hacking and the method is generating broken images. For instance, excessive g
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
