Exploiting Exogenous Structure for Sample-Efficient Reinforcement   Learning

Jia Wan; Sean R. Sinclair; Devavrat Shah; Martin J. Wainwright

arXiv:2409.14557·stat.ML·February 6, 2025

Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning

Jia Wan, Sean R. Sinclair, Devavrat Shah, Martin J. Wainwright

PDF

Open Access 1 Repo

TL;DR

This paper introduces Exo-MDPs, a structured class of MDPs with exogenous and endogenous states, providing theoretical analysis of their properties, regret bounds, and demonstrating sample efficiency benefits through experiments.

Contribution

It establishes a representational equivalence between Exo-MDPs and linear mixture MDPs, and derives regret bounds showing sample efficiency advantages.

Findings

01

Exo-MDPs are equivalent to linear mixture MDPs.

02

Regret upper bound of $O(H^{3/2}drac{ ext{sqrt}(K)}{K}$ for unobserved exogenous states.

03

Sample complexity decouples from action and endogenous state sizes.

Abstract

We study Exo-MDPs, a structured class of Markov Decision Processes (MDPs) where the state space is partitioned into exogenous and endogenous components. Exogenous states evolve stochastically, independent of the agent's actions, while endogenous states evolve deterministically based on both state components and actions. Exo-MDPs are useful for applications including inventory control, portfolio management, and ride-sharing. Our first result is structural, establishing a representational equivalence between the classes of discrete MDPs, Exo-MDPs, and discrete linear mixture MDPs. Specifically, any discrete MDP can be represented as an Exo-MDP, and the transition and reward dynamics can be written as linear functions of the exogenous state distribution, showing that Exo-MDPs are instances of linear mixture MDPs. For unobserved exogenous states, we prove a regret upper bound of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jw3479/exogenous_mdps
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Face and Expression Recognition · Reinforcement Learning in Robotics