Planning and Learning with Adaptive Lookahead

Aviv Rosenberg; Assaf Hallak; Shie Mannor; Gal Chechik and; Gal Dalal

arXiv:2201.12403·cs.LG·January 19, 2023

Planning and Learning with Adaptive Lookahead

Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik and, Gal Dalal

PDF

Open Access

TL;DR

This paper introduces a theoretically grounded method for adaptively selecting the planning horizon in reinforcement learning, improving efficiency and performance in complex environments.

Contribution

It proposes a novel adaptive lookahead strategy based on state-dependent value estimates and develops a deep Q-network algorithm incorporating this approach.

Findings

01

Effective in maze environments

02

Improves performance in Atari games

03

Balances iteration count and computational complexity

Abstract

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)