Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor,, Gal Chechik

TL;DR
This paper introduces a novel off-policy correction for tree search in reinforcement learning, improving pre-trained agents without retraining and addressing scalability with a GPU-based breadth-first search method, leading to significant performance gains.
Contribution
It proposes an off-policy correction method to fix distribution shift issues in tree search and introduces Batch-BFS for scalable, efficient tree search on GPUs, enabling improved agent performance.
Findings
Significantly improved pre-trained Rainbow agents without retraining.
Enhanced scalability allowing deeper tree searches with GPU acceleration.
Improved DQN agents on Atari games using tree search techniques.
Abstract
Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: \textit{distribution shift} and \textit{scalability}. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories. We prove that our correction eliminates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications
MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network · Spatio-temporal stability analysis
