Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight
Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu, Bai

TL;DR
This paper introduces a new feedback model for POMDPs called 'multiple observations in hindsight' and demonstrates that it enables sample-efficient learning for two new subclasses, relaxing previous assumptions and broadening applicability.
Contribution
The paper proposes a novel feedback model and establishes sample-efficient learning algorithms for multi-observation revealing and distinguishable POMDPs, expanding the classes of POMDPs that can be learned efficiently.
Findings
Sample-efficient learning is achievable under the new feedback model.
Two new subclasses of POMDPs are identified for efficient learning.
Relaxed conditions for distinguishable POMDPs compared to revealing POMDPs.
Abstract
This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
