A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings
Safa Alver, Doina Precup

TL;DR
This paper compares value-based decision-time and background planning methods in reinforcement learning, providing theoretical insights and experiments that show modern decision-time methods often outperform background methods in various settings.
Contribution
It offers the first theoretical comparison of value-based decision-time and background planning, and empirically validates their performance differences in modern instantiations.
Findings
Modern value-based decision-time planning can outperform background planning.
Simplest instantiations perform similarly, but modern versions differ.
Theoretical results support experimental findings.
Abstract
In model-based reinforcement learning (RL), an agent can leverage a learned model to improve its way of behaving in different ways. Two of the prevalent ways to do this are through decision-time and background planning methods. In this study, we are interested in understanding how the value-based versions of these two planning methods will compare against each other across different settings. Towards this goal, we first consider the simplest instantiations of value-based decision-time and background planning methods and provide theoretical results on which one will perform better in the regular RL and transfer learning settings. Then, we consider the modern instantiations of them and provide hypotheses on which one will perform better in the same settings. Finally, we perform illustrative experiments to validate these theoretical results and hypotheses. Overall, our findings suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
