The Role of Lookahead and Approximate Policy Evaluation in Reinforcement   Learning with Linear Value Function Approximation

Anna Winnicki; Joseph Lubars; Michael Livesay; R. Srikant

arXiv:2109.13419·cs.LG·December 15, 2022

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant

PDF

Open Access

TL;DR

This paper analyzes how lookahead and m-step rollout techniques influence the stability and convergence of approximate dynamic programming in reinforcement learning with linear function approximation, providing quantitative insights.

Contribution

It provides the first quantitative characterization of the effects of lookahead and rollout on approximate DP with function approximation, including convergence and performance impacts.

Findings

01

Without sufficient lookahead and rollout, approximate DP may not converge.

02

Lookahead and rollout improve convergence rates of approximate DP.

03

Lookahead mitigates the effects of function approximation errors and discount factors.

Abstract

Function approximation is widely used in reinforcement learning to handle the computational difficulties associated with very large state spaces. However, function approximation introduces errors which may lead to instabilities when using approximate dynamic programming techniques to obtain the optimal policy. Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation. We quantitatively characterize, for the first time, the impact of lookahead and m-step rollout on the performance of approximate dynamic programming (DP) with function approximation: (i) without a sufficient combination of lookahead and m-step rollout, approximate DP may not converge, (ii) both lookahead and m-step rollout improve the convergence rate of approximate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMechanical Circulatory Support Devices · Adaptive Dynamic Programming Control · Reinforcement Learning in Robotics