Reinforcement Learning with Unbiased Policy Evaluation and Linear   Function Approximation

Anna Winnicki; R. Srikant

arXiv:2210.07338·cs.LG·October 17, 2022

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

Anna Winnicki, R. Srikant

PDF

Open Access

TL;DR

This paper presents performance guarantees for two reinforcement learning algorithms that use stochastic approximation, lookahead, and function approximation to efficiently control large Markov decision processes.

Contribution

It introduces and analyzes two novel algorithms combining least squares and stochastic approximation for policy iteration in large MDPs.

Findings

01

Performance guarantees established for the algorithms.

02

Effective handling of large MDPs with function approximation.

03

Convergence analysis of the proposed methods.

Abstract

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Software Reliability and Analysis Research