The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant

TL;DR
This paper analyzes how lookahead and m-step rollout techniques influence the stability and convergence of approximate dynamic programming in reinforcement learning with linear function approximation, providing quantitative insights.
Contribution
It provides the first quantitative characterization of the effects of lookahead and rollout on approximate DP with function approximation, including convergence and performance impacts.
Findings
Without sufficient lookahead and rollout, approximate DP may not converge.
Lookahead and rollout improve convergence rates of approximate DP.
Lookahead mitigates the effects of function approximation errors and discount factors.
Abstract
Function approximation is widely used in reinforcement learning to handle the computational difficulties associated with very large state spaces. However, function approximation introduces errors which may lead to instabilities when using approximate dynamic programming techniques to obtain the optimal policy. Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation. We quantitatively characterize, for the first time, the impact of lookahead and m-step rollout on the performance of approximate dynamic programming (DP) with function approximation: (i) without a sufficient combination of lookahead and m-step rollout, approximate DP may not converge, (ii) both lookahead and m-step rollout improve the convergence rate of approximate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMechanical Circulatory Support Devices · Adaptive Dynamic Programming Control · Reinforcement Learning in Robotics
