Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation
Anna Winnicki, R. Srikant

TL;DR
This paper presents performance guarantees for two reinforcement learning algorithms that use stochastic approximation, lookahead, and function approximation to efficiently control large Markov decision processes.
Contribution
It introduces and analyzes two novel algorithms combining least squares and stochastic approximation for policy iteration in large MDPs.
Findings
Performance guarantees established for the algorithms.
Effective handling of large MDPs with function approximation.
Convergence analysis of the proposed methods.
Abstract
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Software Reliability and Analysis Research
