Lower Bounds for Learning in Revealing POMDPs
Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai

TL;DR
This paper establishes fundamental lower bounds for reinforcement learning in revealing POMDPs, showing that learning complexity is inherently higher than in fully observable MDPs, especially for multi-step scenarios.
Contribution
It provides the first strong PAC and regret lower bounds for revealing POMDPs, clarifying the intrinsic difficulty and scaling of learning in these models.
Findings
Latent state-space dependence is at least Ω(S^{1.5}) in PAC complexity.
Polynomial sublinear regret is at least Ω(T^{2/3}) in revealing POMDPs.
Distribution testing techniques are used to construct hard instances, a novel approach in RL.
Abstract
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging \emph{partially observable} setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the \emph{revealing condition} -- A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds. We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Bayesian Modeling and Causal Inference
