A Look at Value-Based Decision-Time vs. Background Planning Methods   Across Different Settings

Safa Alver; Doina Precup

arXiv:2206.08442·cs.LG·August 13, 2024

A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings

Safa Alver, Doina Precup

PDF

Open Access

TL;DR

This paper compares value-based decision-time and background planning methods in reinforcement learning, providing theoretical insights and experiments that show modern decision-time methods often outperform background methods in various settings.

Contribution

It offers the first theoretical comparison of value-based decision-time and background planning, and empirically validates their performance differences in modern instantiations.

Findings

01

Modern value-based decision-time planning can outperform background planning.

02

Simplest instantiations perform similarly, but modern versions differ.

03

Theoretical results support experimental findings.

Abstract

In model-based reinforcement learning (RL), an agent can leverage a learned model to improve its way of behaving in different ways. Two of the prevalent ways to do this are through decision-time and background planning methods. In this study, we are interested in understanding how the value-based versions of these two planning methods will compare against each other across different settings. Towards this goal, we first consider the simplest instantiations of value-based decision-time and background planning methods and provide theoretical results on which one will perform better in the regular RL and transfer learning settings. Then, we consider the modern instantiations of them and provide hypotheses on which one will perform better in the same settings. Finally, we perform illustrative experiments to validate these theoretical results and hypotheses. Overall, our findings suggest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics