Provable Partially Observable Reinforcement Learning with Privileged   Information

Yang Cai; Xiangyu Liu; Argyris Oikonomou; Kaiqing Zhang

arXiv:2412.00985·cs.LG·February 24, 2025

Provable Partially Observable Reinforcement Learning with Privileged Information

Yang Cai, Xiangyu Liu, Argyris Oikonomou, Kaiqing Zhang

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of reinforcement learning with privileged information under partial observability, proposing algorithms with provable efficiency and analyzing their sample and computational complexities.

Contribution

It formalizes and analyzes the sample and computational complexities of expert distillation and asymmetric actor-critic methods in partially observable RL with privileged information.

Findings

01

Expert distillation can be inefficient without the deterministic filter condition.

02

The belief-weighted asymmetric actor-critic achieves polynomial sample and quasi-polynomial computational complexity.

03

Algorithms for multi-agent RL with privileged information are developed with provable efficiency.

Abstract

Partial observability of the underlying states generally presents significant challenges for reinforcement learning (RL). In practice, certain \emph{privileged information}, e.g., the access to states from simulators, has been exploited in training and has achieved prominent empirical successes. To better understand the benefits of privileged information, we revisit and examine several simple and practically used paradigms in this setting. Specifically, we first formalize the empirical paradigm of \emph{expert distillation} (also known as \emph{teacher-student} learning), demonstrating its pitfall in finding near-optimal policies. We then identify a condition of the partially observable environment, the \emph{deterministic filter condition}, under which expert distillation achieves sample and computational complexities that are \emph{both} polynomial. Furthermore, we investigate another…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Elevator Systems and Control · Adaptive Dynamic Programming Control

MethodsFocus