On Value Functions and the Agent-Environment Boundary

Nan Jiang

arXiv:1905.13341·cs.LG·June 2, 2020·6 cites

On Value Functions and the Agent-Environment Boundary

Nan Jiang

PDF

Open Access

TL;DR

This paper investigates how the definition of value functions in reinforcement learning depends on the agent-environment boundary, proposing a boundary-invariant analysis of Fitted Q-Iteration to address theoretical inconsistencies.

Contribution

It introduces a novel boundary-invariant framework for analyzing RL algorithms, ensuring consistency regardless of how the agent-environment boundary is drawn.

Findings

01

Boundary-invariant analysis of Fitted Q-Iteration developed

02

Addresses issues in state resetting and Monte-Carlo Tree Search

03

Discusses implications for deterministic vs stochastic systems

Abstract

When function approximation is deployed in reinforcement learning (RL), the same problem may be formulated in different ways, often by treating a pre-processing step as a part of the environment or as part of the agent. As a consequence, fundamental concepts in RL, such as (optimal) value functions, are not uniquely defined as they depend on where we draw this agent-environment boundary, causing problems in theoretical analyses that provide optimality guarantees. We address this issue via a simple and novel boundary-invariant analysis of Fitted Q-Iteration, a representative RL algorithm, where the assumptions and the guarantees are invariant to the choice of boundary. We also discuss closely related issues on state resetting and Monte-Carlo Tree Search, deterministic vs stochastic systems, imitation learning, and the verifiability of theoretical assumptions from data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Economic theories and models

MethodsMonte-Carlo Tree Search