Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Uri Koren; Navdeep Kumar; Uri Gadot; Giorgia Ramponi; Kfir Yehuda Levy; Shie Mannor

arXiv:2506.07054·cs.LG·June 10, 2025

Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Uri Koren, Navdeep Kumar, Uri Gadot, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces Policy Gradient with Tree Search (PGTS), a method that uses lookahead to avoid local optima in reinforcement learning, backed by theoretical guarantees and empirical success in complex environments.

Contribution

The work presents a novel PGTS approach that integrates lookahead with policy gradients, providing theoretical analysis and empirical evidence of improved performance over standard methods.

Findings

01

PGTS reduces undesirable stationary points with increased lookahead depth.

02

Theoretical analysis shows improved worst-case performance with deeper lookahead.

03

Empirical results demonstrate PGTS's ability to escape local traps and find better policies.

Abstract

Classical policy gradient (PG) methods in reinforcement learning frequently converge to suboptimal local optima, a challenge exacerbated in large or complex environments. This work investigates Policy Gradient with Tree Search (PGTS), an approach that integrates an $m$ -step lookahead mechanism to enhance policy optimization. We provide theoretical analysis demonstrating that increasing the tree search depth $m$ -monotonically reduces the set of undesirable stationary points and, consequently, improves the worst-case performance of any resulting stationary policy. Critically, our analysis accommodates practical scenarios where policy updates are restricted to states visited by the current policy, rather than requiring updates across the entire state space. Empirical evaluations on diverse MDP structures, including Ladder, Tightrope, and Gridworld environments, illustrate PGTS's ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Artificial Intelligence in Games