The Concept of Criticality in Reinforcement Learning

Yitzhak Spielberg; Amos Azaria

arXiv:1810.07254·cs.LG·October 18, 2018

The Concept of Criticality in Reinforcement Learning

Yitzhak Spielberg, Amos Azaria

PDF

TL;DR

This paper introduces a novel reinforcement learning framework where each state has a specific n-step update parameter, guided by human-provided criticality measures, to optimize the bias-variance trade-off and improve learning efficiency.

Contribution

It extends n-step algorithms by allowing state-specific n values and incorporates human input on state criticality to enhance RL performance.

Findings

01

State-specific n-step updates improve learning efficiency.

02

Human-provided criticality measures guide optimal n selection.

03

The approach adapts the bias-variance trade-off dynamically.

Abstract

Reinforcement learning methods carry a well known bias-variance trade-off in n-step algorithms for optimal control. Unfortunately, this has rarely been addressed in current research. This trade-off principle holds independent of the choice of the algorithm, such as n-step SARSA, n-step Expected SARSA or n-step Tree backup. A small n results in a large bias, while a large n leads to large variance. The literature offers no straightforward recipe for the best choice of this value. While currently all n-step algorithms use a fixed value of n over the state space we extend the framework of n-step updates by allowing each state to have its specific n. We propose a solution to this problem within the context of human aided reinforcement learning. Our approach is based on the observation that a human can learn more efficiently if she receives input regarding the criticality of a given state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsExpected Sarsa · Sarsa