Toward Learning POMDPs Beyond Full-Rank Actions and State Observability
Seiji Shaw, Travis Manderson, Chad Kessens, and Nicholas Roy

TL;DR
This paper introduces a spectral method for learning POMDP models beyond full-rank actions and state observability, enabling flexible planning with learned models in partially observable domains.
Contribution
It presents a novel approach to learn POMDP parameters up to a similarity transform using tensor decomposition under mild rank assumptions, extending beyond full-rank actions.
Findings
Learned POMDP matrices up to a state partition.
Explicit observation and transition models enable planning for different rewards.
Learning beyond a state partition is impossible from sequential data.
Abstract
We are interested in enabling autonomous agents to learn and reason about systems with hidden states, such as locking mechanisms. We cast this problem as learning the parameters of a discrete Partially Observable Markov Decision Process (POMDP). The agent begins with knowledge of the POMDP's actions and observation spaces, but not its state space, transitions, or observation models. These properties must be constructed from a sequence of actions and observations. Spectral approaches to learning models of partially observable domains, such as Predictive State Representations (PSRs), learn representations of state that are sufficient to predict future outcomes. PSR models, however, do not have explicit transition and observation system models that can be used with different reward functions to solve different planning problems. Under a mild set of rankness assumptions on the products of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robot Manipulation and Learning
