Improve Agents without Retraining: Parallel Tree Search with Off-Policy   Correction

Assaf Hallak; Gal Dalal; Steven Dalton; Iuri Frosio; Shie Mannor,; Gal Chechik

arXiv:2107.01715·cs.AI·February 7, 2023·1 cites

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Assaf Hallak, Gal Dalal, Steven Dalton, Iuri Frosio, Shie Mannor,, Gal Chechik

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel off-policy correction for tree search in reinforcement learning, improving pre-trained agents without retraining and addressing scalability with a GPU-based breadth-first search method, leading to significant performance gains.

Contribution

It proposes an off-policy correction method to fix distribution shift issues in tree search and introduces Batch-BFS for scalable, efficient tree search on GPUs, enabling improved agent performance.

Findings

01

Significantly improved pre-trained Rainbow agents without retraining.

02

Enhanced scalability allowing deeper tree searches with GPU acceleration.

03

Improved DQN agents on Atari games using tree search techniques.

Abstract

Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: \textit{distribution shift} and \textit{scalability}. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories. We prove that our correction eliminates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvlabs/bcts
pytorchOfficial

Videos

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications

MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network · Spatio-temporal stability analysis