Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information
Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, and John Langford

TL;DR
This paper introduces the ExoMDP framework and ExoRL algorithm, demonstrating that reinforcement learning can be made sample-efficient even with high-dimensional, irrelevant exogenous information by focusing on the controllable part of the state.
Contribution
The paper defines the ExoMDP setting, proposes the ExoRL algorithm, and proves near-optimal sample complexity that is independent of the exogenous component's size, a significant advancement in RL theory.
Findings
ExoRL achieves polynomial sample complexity in the endogenous state space.
Sample complexity is nearly independent of the exogenous component size.
First demonstration of sample-efficient RL with exogenous information.
Abstract
In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand. Learning from high-dimensional observations has been the subject of extensive investigation in supervised learning and statistics (e.g., via sparsity), but analogous issues in reinforcement learning are not well understood, even in finite state/action (tabular) domains. We introduce a new problem setting for reinforcement learning, the Exogenous Markov Decision Process (ExoMDP), in which the state space admits an (unknown) factorization into a small controllable (or, endogenous) component and a large irrelevant (or, exogenous) component; the exogenous component is independent of the learner's actions, but evolves in an arbitrary, temporally correlated fashion. We provide a new algorithm, ExoRL, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
