On Value Functions and the Agent-Environment Boundary
Nan Jiang

TL;DR
This paper investigates how the definition of value functions in reinforcement learning depends on the agent-environment boundary, proposing a boundary-invariant analysis of Fitted Q-Iteration to address theoretical inconsistencies.
Contribution
It introduces a novel boundary-invariant framework for analyzing RL algorithms, ensuring consistency regardless of how the agent-environment boundary is drawn.
Findings
Boundary-invariant analysis of Fitted Q-Iteration developed
Addresses issues in state resetting and Monte-Carlo Tree Search
Discusses implications for deterministic vs stochastic systems
Abstract
When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent. As a consequence, fundamental concepts in RL, such as (optimal) value functions, are not uniquely defined as they depend on where we draw this agent-environment boundary, causing problems in theoretical analyses that provide optimality guarantees. We address this issue via a simple and novel boundary-invariant analysis of Fitted Q-Iteration, a representative RL algorithm, where the assumptions and the guarantees are invariant to the choice of boundary. We also discuss closely related issues on state resetting and Monte-Carlo Tree Search, deterministic vs stochastic systems, imitation learning, and the verifiability of theoretical assumptions from data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Economic theories and models
MethodsMonte-Carlo Tree Search
