Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
Jia Wan, Sean R. Sinclair, Devavrat Shah, Martin J. Wainwright

TL;DR
This paper introduces Exo-MDPs, a structured class of MDPs with exogenous and endogenous states, providing theoretical analysis of their properties, regret bounds, and demonstrating sample efficiency benefits through experiments.
Contribution
It establishes a representational equivalence between Exo-MDPs and linear mixture MDPs, and derives regret bounds showing sample efficiency advantages.
Findings
Exo-MDPs are equivalent to linear mixture MDPs.
Regret upper bound of $O(H^{3/2}drac{ ext{sqrt}(K)}{K}$ for unobserved exogenous states.
Sample complexity decouples from action and endogenous state sizes.
Abstract
We study Exo-MDPs, a structured class of Markov Decision Processes (MDPs) where the state space is partitioned into exogenous and endogenous components. Exogenous states evolve stochastically, independent of the agent's actions, while endogenous states evolve deterministically based on both state components and actions. Exo-MDPs are useful for applications including inventory control, portfolio management, and ride-sharing. Our first result is structural, establishing a representational equivalence between the classes of discrete MDPs, Exo-MDPs, and discrete linear mixture MDPs. Specifically, any discrete MDP can be represented as an Exo-MDP, and the transition and reward dynamics can be written as linear functions of the exogenous state distribution, showing that Exo-MDPs are instances of linear mixture MDPs. For unobserved exogenous states, we prove a regret upper bound of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Face and Expression Recognition · Reinforcement Learning in Robotics
