Reconciling Rewards with Predictive State Representations

Andrea Baisero; Christopher Amato

arXiv:2106.03926·cs.AI·February 1, 2022

Reconciling Rewards with Predictive State Representations

Andrea Baisero, Christopher Amato

PDF

1 Repo

TL;DR

This paper introduces reward-predictive state representations (R-PSRs) that accurately model both observations and rewards in POMDPs, addressing limitations of traditional PSRs for control and planning tasks.

Contribution

The paper defines an accuracy condition for PSRs to model rewards, proposes R-PSRs as a generalization, and develops value iteration methods for them.

Findings

01

Many POMDPs do not satisfy the accuracy condition for PSRs.

02

R-PSRs can accurately model rewards and observations.

03

Optimal R-PSR policies match optimal POMDP policies.

Abstract

Predictive state representations (PSRs) are models of controlled non-Markov observation sequences which exhibit the same generative process governing POMDP observations without relying on an underlying latent state. In that respect, a PSR is indistinguishable from the corresponding POMDP. However, PSRs notoriously ignore the notion of rewards, which undermines the general utility of PSR models for control, planning, or reinforcement learning. Therefore, we describe a sufficient and necessary accuracy condition which determines whether a PSR is able to accurately model POMDP rewards, we show that rewards can be approximated even when the accuracy condition is not satisfied, and we find that a non-trivial number of POMDPs taken from a well-known third-party repository do not satisfy the accuracy condition. We propose reward-predictive state representations (R-PSRs), a generalization of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abaisero/rl-rpsr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.