Provable Partially Observable Reinforcement Learning with Privileged Information
Yang Cai, Xiangyu Liu, Argyris Oikonomou, Kaiqing Zhang

TL;DR
This paper provides a theoretical analysis of reinforcement learning with privileged information under partial observability, proposing algorithms with provable efficiency and analyzing their sample and computational complexities.
Contribution
It formalizes and analyzes the sample and computational complexities of expert distillation and asymmetric actor-critic methods in partially observable RL with privileged information.
Findings
Expert distillation can be inefficient without the deterministic filter condition.
The belief-weighted asymmetric actor-critic achieves polynomial sample and quasi-polynomial computational complexity.
Algorithms for multi-agent RL with privileged information are developed with provable efficiency.
Abstract
Partial observability of the underlying states generally presents significant challenges for reinforcement learning (RL). In practice, certain \emph{privileged information}, e.g., the access to states from simulators, has been exploited in training and has achieved prominent empirical successes. To better understand the benefits of privileged information, we revisit and examine several simple and practically used paradigms in this setting. Specifically, we first formalize the empirical paradigm of \emph{expert distillation} (also known as \emph{teacher-student} learning), demonstrating its pitfall in finding near-optimal policies. We then identify a condition of the partially observable environment, the \emph{deterministic filter condition}, under which expert distillation achieves sample and computational complexities that are \emph{both} polynomial. Furthermore, we investigate another…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Elevator Systems and Control · Adaptive Dynamic Programming Control
MethodsFocus
