# Tree-Guided Diffusion Planner

**Authors:** Hyeonseong Jeon, Cheolhong Min, Jaesik Park

arXiv: 2508.21800 · 2025-11-11

## TL;DR

The paper introduces a Tree-guided Diffusion Planner (TDP), a zero-shot test-time planning method that improves exploration and task success in complex, non-convex environments using pretrained diffusion models and structured trajectory search.

## Contribution

It proposes a novel tree-guided, zero-shot planning framework that balances exploration and exploitation without task-specific training, addressing limitations of gradient guidance in complex scenarios.

## Key findings

- TDP outperforms state-of-the-art methods on maze, robot manipulation, and multi-goal exploration tasks.
- TDP effectively balances exploration and exploitation using a bi-level sampling process.
- The approach demonstrates strong zero-shot generalization in diverse control problems.

## Abstract

Planning with pretrained diffusion models has emerged as a promising approach for solving test-time guided control problems. Standard gradient guidance typically performs optimally under convex, differentiable reward landscapes. However, it shows substantially reduced effectiveness in real-world scenarios with non-convex objectives, non-differentiable constraints, and multi-reward structures. Furthermore, recent supervised planning approaches require task-specific training or value estimators, which limits test-time flexibility and zero-shot generalization. We propose a Tree-guided Diffusion Planner (TDP), a zero-shot test-time planning framework that balances exploration and exploitation through structured trajectory generation. We frame test-time planning as a tree search problem using a bi-level sampling process: (1) diverse parent trajectories are produced via training-free particle guidance to encourage broad exploration, and (2) sub-trajectories are refined through fast conditional denoising guided by task objectives. TDP addresses the limitations of gradient guidance by exploring diverse trajectory regions and harnessing gradient information across this expanded solution space using only pretrained models and test-time reward signals. We evaluate TDP on three diverse tasks: maze gold-picking, robot arm block manipulation, and AntMaze multi-goal exploration. TDP consistently outperforms state-of-the-art approaches on all tasks. The project page can be found at: https://tree-diffusion-planner.github.io.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21800/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21800/full.md

---
Source: https://tomesphere.com/paper/2508.21800