Identifying Critical States by the Action-Based Variance of Expected Return
Izumi Karino, Yoshiyuki Ohmura, Yasuo Kuniyoshi

TL;DR
This paper introduces a method to identify critical states in reinforcement learning using action-based variance of expected return, improving learning speed and interpretability.
Contribution
It proposes a novel approach to detect critical states via Q-function variance, enhancing RL efficiency and explainability.
Findings
Accelerated RL in grid world and deep RL tasks.
Identified critical states are interpretable and crucial for decision-making.
Timing of critical state identification influences learning speed.
Abstract
The balance of exploration and exploitation plays a crucial role in accelerating reinforcement learning (RL). To deploy an RL agent in human society, its explainability is also essential. However, basic RL approaches have difficulties in deciding when to choose exploitation as well as in extracting useful points for a brief explanation of its operation. One reason for the difficulties is that these approaches treat all states the same way. Here, we show that identifying critical states and treating them specially is commonly beneficial to both problems. These critical states are the states at which the action selection changes the potential of success and failure substantially. We propose to identify the critical states using the variance in the Q-function for the actions and to perform exploitation with high probability on the identified states. These simple methods accelerate RL in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
