Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

Christoph Dann; Yishay Mansour; Mehryar Mohri; Ayush Sekhari and; Karthik Sridharan

arXiv:2106.11519·cs.LG·June 23, 2021·1 cites

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari and, Karthik Sridharan

PDF

Open Access 1 Video

TL;DR

This paper introduces an algorithm for agnostic reinforcement learning in rich observation spaces that bounds error based on the MDP's rank, addressing practical limitations of previous realizability assumptions.

Contribution

It presents a novel algorithm with sample complexity bounds for agnostic RL in low-rank MDPs, and establishes a lower bound showing exponential dependence on rank is unavoidable.

Findings

01

Sample complexity depends exponentially on the rank of the MDP.

02

The algorithm performs well even when the policy class does not contain near-optimal policies.

03

A nearly matching lower bound demonstrates the fundamental difficulty of the problem.

Abstract

There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $Π$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of $O ((H^{4 d} K^{3 d} lo g ∣Π∣) / ϵ^{2})$ where $H$ is the length of episodes, $K$ is the number of actions and $ϵ > 0$ is the desired sub-optimality. We also provide a nearly matching lower bound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms