Renewal Monte Carlo: Renewal theory based reinforcement learning

Jayakumar Subramanian; Aditya Mahajan

arXiv:1804.01116·cs.LG·April 5, 2018

Renewal Monte Carlo: Renewal theory based reinforcement learning

Jayakumar Subramanian, Aditya Mahajan

PDF

TL;DR

Renewal Monte Carlo (RMC) is an online reinforcement learning algorithm that leverages renewal theory to reduce variance and improve efficiency in infinite horizon MDPs, with proven convergence to local optima.

Contribution

The paper introduces RMC, a novel Monte Carlo-based RL algorithm utilizing renewal theory to enhance convergence and reduce variance in infinite horizon problems.

Findings

01

RMC converges to a locally optimal policy.

02

Two unbiased gradient estimators are proposed and validated.

03

Numerical experiments demonstrate RMC's effectiveness in various scenarios.

Abstract

In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance and delayed (end of episode) updates. The key ideas behind RMC are as follows. First, under any reasonable policy, the reward process is ergodic. So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected discounted time over a regenerative cycle. Second, by carefully examining the expression for performance gradient, we propose a stochastic approximation algorithm that only requires estimates of the expected discounted reward and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.