Stochastic Reinforcement Learning

Nikki Lijing Kuang; Clement H. C. Leung; and Vienne W. K. Sung

arXiv:1902.04178·cs.LG·February 13, 2019

Stochastic Reinforcement Learning

Nikki Lijing Kuang, Clement H. C. Leung, and Vienne W. K. Sung

PDF

TL;DR

This paper introduces a stochastic reinforcement learning approach that explicitly models environmental variability and observation costs, providing criteria for success and probabilistic bounds on costs.

Contribution

It presents a novel stochastic framework for reinforcement learning that accounts for environmental randomness and observation costs, with quantitative analysis of success criteria.

Findings

01

Provides probabilistic bounds on observation costs.

02

Develops criteria for successful learning under stochastic conditions.

03

Models environmental variability explicitly in reinforcement learning.

Abstract

In reinforcement learning episodes, the rewards and punishments are often non-deterministic, and there are invariably stochastic elements governing the underlying situation. Such stochastic elements are often numerous and cannot be known in advance, and they have a tendency to obscure the underlying rewards and punishments patterns. Indeed, if stochastic elements were absent, the same outcome would occur every time and the learning problems involved could be greatly simplified. In addition, in most practical situations, the cost of an observation to receive either a reward or punishment can be significant, and one would wish to arrive at the correct learning conclusion by incurring minimum cost. In this paper, we present a stochastic approach to reinforcement learning which explicitly models the variability present in the learning environment and the cost of observation. Criteria and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.