Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari and, Karthik Sridharan

TL;DR
This paper introduces an algorithm for agnostic reinforcement learning in rich observation spaces that bounds error based on the MDP's rank, addressing practical limitations of previous realizability assumptions.
Contribution
It presents a novel algorithm with sample complexity bounds for agnostic RL in low-rank MDPs, and establishes a lower bound showing exponential dependence on rank is unavoidable.
Findings
Sample complexity depends exponentially on the rank of the MDP.
The algorithm performs well even when the policy class does not contain near-optimal policies.
A nearly matching lower bound demonstrates the fundamental difficulty of the problem.
Abstract
There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of where is the length of episodes, is the number of actions and is the desired sub-optimality. We also provide a nearly matching lower bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
