Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over   Markovian Samples

Tengyu Xu; Shaofeng Zou; Yingbin Liang

arXiv:1909.11907·cs.LG·September 27, 2019·44 cites

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Tengyu Xu, Shaofeng Zou, Yingbin Liang

PDF

Open Access

TL;DR

This paper provides the first non-asymptotic convergence analysis of two time-scale TDC algorithms under Markovian samples, demonstrating various convergence rates and proposing a new blockwise stepsize method.

Contribution

It introduces a non-asymptotic analysis of two time-scale TDC with Markovian data and proposes a blockwise diminishing stepsize algorithm for improved convergence.

Findings

01

TDC converges at O(log t / t^(2/3)) with diminishing stepsize

02

Exponential convergence of TDC with constant stepsize but with a non-zero error

03

Proposed blockwise stepsize TDC converges arbitrarily close to optimal with linear rate

Abstract

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/(t^(2/3))) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Age of Information Optimization