TL;DR
This paper introduces reward-predictive state representations (R-PSRs) that accurately model both observations and rewards in POMDPs, addressing limitations of traditional PSRs for control and planning tasks.
Contribution
The paper defines an accuracy condition for PSRs to model rewards, proposes R-PSRs as a generalization, and develops value iteration methods for them.
Findings
Many POMDPs do not satisfy the accuracy condition for PSRs.
R-PSRs can accurately model rewards and observations.
Optimal R-PSR policies match optimal POMDP policies.
Abstract
Predictive state representations (PSRs) are models of controlled non-Markov observation sequences which exhibit the same generative process governing POMDP observations without relying on an underlying latent state. In that respect, a PSR is indistinguishable from the corresponding POMDP. However, PSRs notoriously ignore the notion of rewards, which undermines the general utility of PSR models for control, planning, or reinforcement learning. Therefore, we describe a sufficient and necessary accuracy condition which determines whether a PSR is able to accurately model POMDP rewards, we show that rewards can be approximated even when the accuracy condition is not satisfied, and we find that a non-trivial number of POMDPs taken from a well-known third-party repository do not satisfy the accuracy condition. We propose reward-predictive state representations (R-PSRs), a generalization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
