Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen, Nan Jiang

TL;DR
This paper explores the theoretical foundations of batch reinforcement learning, analyzing the assumptions needed for value-function approximation guarantees and clarifying their necessity and naturalness.
Contribution
It provides new theoretical insights into the assumptions underlying finite sample guarantees in batch RL, addressing their necessity and conditions for validity.
Findings
Clarifies the necessity of distribution shift assumptions
Analyzes the naturalness of representation conditions
Advances understanding of value-function approximation in RL
Abstract
Value-function approximation methods that operate in batch mode have foundational importance to reinforcement learning (RL). Finite sample guarantees for these methods often crucially rely on two types of assumptions: (1) mild distribution shift, and (2) representation conditions that are stronger than realizability. However, the necessity ("why do we need them?") and the naturalness ("when do they hold?") of such assumptions have largely eluded the literature. In this paper, we revisit these assumptions and provide theoretical results towards answering the above questions, and make steps towards a deeper understanding of value-function approximation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Scheduling and Optimization Algorithms
