Finite-Sample Analysis for SARSA with Linear Function Approximation
Shaofeng Zou, Tengyu Xu, Yingbin Liang

TL;DR
This paper presents a novel finite-sample analysis of the SARSA algorithm with linear function approximation in reinforcement learning, addressing non-i.i.d. data and dynamic policies, and introduces a bias characterization technique for stochastic approximation.
Contribution
It develops a new bias characterization method for stochastic approximation with time-varying Markov kernels, enabling finite-sample convergence analysis of SARSA and its variants.
Findings
Finite-sample bounds for SARSA's mean square error.
Analysis of a more general fitted SARSA algorithm.
Insights into on-policy policy iteration efficiency.
Abstract
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ data, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically \cite{perkins2003convergent,melo2008analysis}. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems
MethodsSarsa
