Exploiting Action Impact Regularity and Exogenous State Variables for   Offline Reinforcement Learning

Vincent Liu; James R. Wright; Martha White

arXiv:2111.08066·cs.LG·May 16, 2023·1 cites

Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

Vincent Liu, James R. Wright, Martha White

PDF

Open Access

TL;DR

This paper introduces a new class of MDPs with the Action Impact Regularity property, enabling effective offline reinforcement learning by exploiting the limited impact of actions on exogenous state components, with theoretical guarantees and empirical validation.

Contribution

The work defines the AIR property for MDPs, develops algorithms leveraging this property, and provides theoretical analysis and empirical results showing improved offline RL performance.

Findings

01

The proposed algorithm outperforms existing offline RL methods in AIR-structured environments.

02

AIR property holds in several real-world domains like financial markets.

03

Theoretical guarantees are established for the Fitted-Q Iteration based algorithm.

Abstract

Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management