Compositional Diffusion with Guided Search for Long-Horizon Planning
Utkarsh A Mishra, David He, Yongxin Chen, Danfei Xu

TL;DR
The paper introduces Compositional Diffusion with Guided Search (CDGS), a method that improves long-horizon planning by embedding search within diffusion models to handle multimodal local distributions and ensure global coherence.
Contribution
It presents a novel approach that integrates search into diffusion processes, enabling effective compositional planning across diverse domains with multimodal local distributions.
Findings
Matches oracle performance on robot manipulation tasks
Outperforms non-compositional baselines
Enables coherent long-horizon generation in images and videos
Abstract
Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this mode averaging problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates…
Peer Reviews
Decision·ICLR 2026 Oral
1. The paper identifies a genuine issue in compositional generative modeling: mode-averaging across multimodal local distributions, which degrades global coherence in long-horizon tasks. In addition, CDGS is evaluated across robotics, image synthesis, and video generation, demonstrating domain-agnostic adaptability at the conceptual level. 2. The method is implemented end-to-end with population-based inference and ablations (with/without pruning, resampling, scaling analysis), demonstrating the
1. The population-based denoising, iterative resampling, and pruning loops likely incur heavy compute costs. No runtime or complexity analysis is provided to justify practicality. Classical planners, such as SVG-MPPI [1] and Reverse-KL MPPI [2], already produce mode-seeking behavior in closed form, with stronger theoretical rigor and lower computational overhead. Within this realm, CDGS lacks a clear justification for its adoption over these established alternatives. 2. CDGS essentially reuses
1. The paper introduces a highly novel and significant approach to long-horizon planning. The combination of compositional generative models (diffusion models) with explicit guided search is a powerful paradigm that effectively balances generation diversity with goal-directedness. This work has the potential to influence future research in planning, robotics, and generative modeling. 2. The experimental evaluation is thorough and convincing. The authors demonstrate the superiority of CDGS across
1. The proposed method seems computationally intensive. The iterative nature of resampling and refining plans at each step of the search could be very time-consuming, which might limit its applicability in real-time or resource-constrained settings. A more detailed analysis of the computational complexity and wall-clock time comparisons with baselines would be beneficial. 2. The paper could benefit from providing more details on how the states and actions for the robotic planning tasks are repre
- **Clear Problem Formulation and an Effective Solution**: The paper clearly articulates critical limitations of prior score-averaging based compositional diffusion methods—the mode-averaging and global incoherence problems. It then introduces a well-motivated and effective solution, CDGS, which synergistically combines two key strategies: (1) iterative resampling to enforce global coherence through message passing , and (2) a novel likelihood-based pruning mechanism to filter out locally infeas
- **Lack of Adaptive Computation and Backtracking**: While CDGS effectively overcomes limitations of prior methods through inference-time search, it lacks adaptability. - **Fixed Computational Overhead**: The proposed method utilizes a fixed computational budget regardless of task difficulty, as determined by hyperparameters (Appendix G.2) . This can lead to inefficient, excessive computation for simpler tasks, or insufficient computation for more complex ones. - **Absence of Backtracking**: C
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
