Finite Sample Analysis of Two-Timescale Stochastic Approximation with   Applications to Reinforcement Learning

Gal Dalal; Balazs Szorenyi; Gugan Thoppe; Shie Mannor

arXiv:1703.05376·cs.AI·June 6, 2018·37 cites

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Gal Dalal, Balazs Szorenyi, Gugan Thoppe, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces a novel finite sample analysis for two-timescale stochastic approximation algorithms in reinforcement learning, providing the first concentration bound and convergence rates for several RL algorithms.

Contribution

It develops a new recipe for finite sample analysis of two-timescale SA, including a lock-in probability bound and a projection scheme that yields convergence rates.

Findings

01

First concentration bound for two-timescale SA

02

Convergence rates for GTD(0), GTD2, and TDC algorithms

03

Insights on stepsize selection for RL algorithms

Abstract

Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as `lock-in probability'. We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection. As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research