On shallow planning under partial observability

Randy Lefebvre; Audrey Durand

arXiv:2407.15820·cs.AI·February 19, 2025

On shallow planning under partial observability

Randy Lefebvre, Audrey Durand

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper studies how the discount factor affects planning in reinforcement learning under partial observability, highlighting that shorter horizons can be advantageous in certain scenarios.

Contribution

It provides an analysis of the bias-variance trade-off related to the discount factor in partially observable environments, guiding better planning horizon choices.

Findings

01

Shorter planning horizons can reduce bias in partial observability.

02

The impact of discount factor varies with MDP structural parameters.

03

Guidelines for selecting discount factors in real-world RL applications.

Abstract

Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, such as selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

graal-research/shallow-planning-partial-observability
noneOfficial

Videos

On Shallow Planning Under Partial Observability· underline

Taxonomy

TopicsAI-based Problem Solving and Planning · Logic, Reasoning, and Knowledge