The Concept of Criticality in Reinforcement Learning
Yitzhak Spielberg, Amos Azaria

TL;DR
This paper introduces a novel reinforcement learning framework where each state has a specific n-step update parameter, guided by human-provided criticality measures, to optimize the bias-variance trade-off and improve learning efficiency.
Contribution
It extends n-step algorithms by allowing state-specific n values and incorporates human input on state criticality to enhance RL performance.
Findings
State-specific n-step updates improve learning efficiency.
Human-provided criticality measures guide optimal n selection.
The approach adapts the bias-variance trade-off dynamically.
Abstract
Reinforcement learning methods carry a well known bias-variance trade-off in n-step algorithms for optimal control. Unfortunately, this has rarely been addressed in current research. This trade-off principle holds independent of the choice of the algorithm, such as n-step SARSA, n-step Expected SARSA or n-step Tree backup. A small n results in a large bias, while a large n leads to large variance. The literature offers no straightforward recipe for the best choice of this value. While currently all n-step algorithms use a fixed value of n over the state space we extend the framework of n-step updates by allowing each state to have its specific n. We propose a solution to this problem within the context of human aided reinforcement learning. Our approach is based on the observation that a human can learn more efficiently if she receives input regarding the criticality of a given state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsExpected Sarsa · Sarsa
