On shallow planning under partial observability
Randy Lefebvre, Audrey Durand

TL;DR
This paper studies how the discount factor affects planning in reinforcement learning under partial observability, highlighting that shorter horizons can be advantageous in certain scenarios.
Contribution
It provides an analysis of the bias-variance trade-off related to the discount factor in partially observable environments, guiding better planning horizon choices.
Findings
Shorter planning horizons can reduce bias in partial observability.
The impact of discount factor varies with MDP structural parameters.
Guidelines for selecting discount factors in real-world RL applications.
Abstract
Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAI-based Problem Solving and Planning · Logic, Reasoning, and Knowledge
