Performance Dynamics and Termination Errors in Reinforcement Learning: A   Unifying Perspective

Nikki Lijing Kuang; Clement H. C. Leung

arXiv:1902.04179·cs.LG·February 13, 2019

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

Nikki Lijing Kuang, Clement H. C. Leung

PDF

TL;DR

This paper analyzes the probability of premature termination errors in reinforcement learning due to stochastic reward sequences, providing mathematical insights and practical mechanisms to reduce such errors, supported by simulations.

Contribution

It offers a unifying combinatorial analysis of termination errors in reinforcement learning and proposes practical methods to mitigate these errors.

Findings

01

Error probability can be mathematically characterized.

02

Premature termination errors can be significantly high.

03

Practical mechanisms can effectively reduce termination errors.

Abstract

In reinforcement learning, a decision needs to be made at some point as to whether it is worthwhile to carry on with the learning process or to terminate it. In many such situations, stochastic elements are often present which govern the occurrence of rewards, with the sequential occurrences of positive rewards randomly interleaved with negative rewards. For most practical learners, the learning is considered useful if the number of positive rewards always exceeds the negative ones. A situation that often calls for learning termination is when the number of negative rewards exceeds the number of positive rewards. However, while this seems reasonable, the error of premature termination, whereby termination is enacted along with the conclusion of learning failure despite the positive rewards eventually far outnumber the negative ones, can be significant. In this paper, using combinatorial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.