Identifying Critical States by the Action-Based Variance of Expected   Return

Izumi Karino; Yoshiyuki Ohmura; Yasuo Kuniyoshi

arXiv:2008.11332·stat.ML·November 10, 2020

Identifying Critical States by the Action-Based Variance of Expected Return

Izumi Karino, Yoshiyuki Ohmura, Yasuo Kuniyoshi

PDF

TL;DR

This paper introduces a method to identify critical states in reinforcement learning using action-based variance of expected return, improving learning speed and interpretability.

Contribution

It proposes a novel approach to detect critical states via Q-function variance, enhancing RL efficiency and explainability.

Findings

01

Accelerated RL in grid world and deep RL tasks.

02

Identified critical states are interpretable and crucial for decision-making.

03

Timing of critical state identification influences learning speed.

Abstract

The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.