Inverse Policy Evaluation for Value-based Sequential Decision-making

Alan Chan; Kris de Asis; Richard S. Sutton

arXiv:2008.11329·cs.LG·August 27, 2020

Inverse Policy Evaluation for Value-based Sequential Decision-making

Alan Chan, Kris de Asis, Richard S. Sutton

PDF

Open Access

TL;DR

This paper introduces inverse policy evaluation as a novel approach to derive behavior from value functions in reinforcement learning, especially when traditional greedy methods are unreliable due to approximation errors.

Contribution

It proposes a new method combining inverse policy evaluation with approximate value iteration to enable value-based control even when value functions do not correspond to any policy.

Findings

01

Inverse policy evaluation can effectively derive policies from arbitrary value functions.

02

The combined method improves control in function approximation regimes.

03

Theoretical and empirical results support the feasibility of the approach.

Abstract

Value-based methods for reinforcement learning lack generally applicable ways to derive behavior from a value function. Many approaches involve approximate value iteration (e.g., $Q$ -learning), and acting greedily with respect to the estimates with an arbitrary degree of entropy to ensure that the state-space is sufficiently explored. Behavior based on explicit greedification assumes that the values reflect those of \textit{some} policy, over which the greedy policy will be an improvement. However, value-iteration can produce value functions that do not correspond to \textit{any} policy. This is especially relevant in the function-approximation regime, when the true value function can't be perfectly represented. In this work, we explore the use of \textit{inverse policy evaluation}, the process of solving for a likely policy given a value function, for deriving behavior from a value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management · Simulation Techniques and Applications