Mitigating Partial Observability in Sequential Decision Processes via   the Lambda Discrepancy

Cameron Allen; Aaron Kirtland; Ruo Yu Tao; Sam Lobel; Daniel Scott,; Nicholas Petrocelli; Omer Gottesman; Ronald Parr; Michael L. Littman; George; Konidaris

arXiv:2407.07333·cs.LG·November 18, 2024

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Cameron Allen, Aaron Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott,, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael L. Littman, George, Konidaris

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the λ-discrepancy metric, which detects non-Markovian states in partially observable environments and helps improve reinforcement learning by minimizing this discrepancy to learn better memory functions.

Contribution

The paper proposes the λ-discrepancy metric to identify partial observability and demonstrates how minimizing it enhances learning in partially observable environments.

Findings

01

λ-discrepancy is zero in Markov decision processes

02

Minimizing λ-discrepancy improves learning in POMDPs

03

Proposed method outperforms single-value network baselines

Abstract

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to -- or knowledge of -- an underlying, unobservable state space. Our metric, the $λ$ -discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD( $λ$ ) with a different value of $λ$ . Since TD( $λ = 0$ ) makes an implicit Markov assumption and TD( $λ = 1$ ) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brownirl/lambda_discrepancy
jaxOfficial

Videos

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy· slideslive

Taxonomy

TopicsSimulation Techniques and Applications