Sample-Efficient Reinforcement Learning in the Presence of Exogenous   Information

Yonathan Efroni; Dylan J. Foster; Dipendra Misra; Akshay Krishnamurthy; and John Langford

arXiv:2206.04282·cs.LG·June 10, 2022

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, and John Langford

PDF

Open Access

TL;DR

This paper introduces the ExoMDP framework and ExoRL algorithm, demonstrating that reinforcement learning can be made sample-efficient even with high-dimensional, irrelevant exogenous information by focusing on the controllable part of the state.

Contribution

The paper defines the ExoMDP setting, proposes the ExoRL algorithm, and proves near-optimal sample complexity that is independent of the exogenous component's size, a significant advancement in RL theory.

Findings

01

ExoRL achieves polynomial sample complexity in the endogenous state space.

02

Sample complexity is nearly independent of the exogenous component size.

03

First demonstration of sample-efficient RL with exogenous information.

Abstract

In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand. Learning from high-dimensional observations has been the subject of extensive investigation in supervised learning and statistics (e.g., via sparsity), but analogous issues in reinforcement learning are not well understood, even in finite state/action (tabular) domains. We introduce a new problem setting for reinforcement learning, the Exogenous Markov Decision Process (ExoMDP), in which the state space admits an (unknown) factorization into a small controllable (or, endogenous) component and a large irrelevant (or, exogenous) component; the exogenous component is independent of the learner's actions, but evolves in an arbitrary, temporally correlated fashion. We provide a new algorithm, ExoRL, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques