Finite-Sample Analysis for SARSA with Linear Function Approximation

Shaofeng Zou; Tengyu Xu; Yingbin Liang

arXiv:1902.02234·cs.LG·November 20, 2019·65 cites

Finite-Sample Analysis for SARSA with Linear Function Approximation

Shaofeng Zou, Tengyu Xu, Yingbin Liang

PDF

Open Access

TL;DR

This paper presents a novel finite-sample analysis of the SARSA algorithm with linear function approximation in reinforcement learning, addressing non-i.i.d. data and dynamic policies, and introduces a bias characterization technique for stochastic approximation.

Contribution

It develops a new bias characterization method for stochastic approximation with time-varying Markov kernels, enabling finite-sample convergence analysis of SARSA and its variants.

Findings

01

Finite-sample bounds for SARSA's mean square error.

02

Analysis of a more general fitted SARSA algorithm.

03

Insights into on-policy policy iteration efficiency.

Abstract

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ data, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically \cite{perkins2003convergent,melo2008analysis}. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems

MethodsSarsa