On-Line Policy Iteration for Infinite Horizon Dynamic Programming

Dimitri Bertsekas

arXiv:2106.00746·math.OC·June 3, 2021

On-Line Policy Iteration for Infinite Horizon Dynamic Programming

Dimitri Bertsekas

PDF

Open Access

TL;DR

This paper introduces an on-line policy iteration algorithm for infinite horizon dynamic programming that updates policies in real-time for encountered states, enabling continuous improvement and adaptability.

Contribution

It presents a novel on-line PI method that updates policies during operation, suitable for online replanning and approximation scenarios.

Findings

01

Converges in finite stages to a locally optimal policy.

02

Enables real-time policy updates during system operation.

03

Compatible with value and policy approximation methods.

Abstract

In this paper we propose an on-line policy iteration (PI) algorithm for finite-state infinite horizon discounted dynamic programming, whereby the policy improvement operation is done on-line, only for the states that are encountered during operation of the system. This allows the continuous updating/improvement of the current policy, thus resulting in a form of on-line PI that incorporates the improved controls into the current policy as new states and controls are generated. The algorithm converges in a finite number of stages to a type of locally optimal policy, and suggests the possibility of variants of PI and multiagent PI where the policy improvement is simplified. Moreover, the algorithm can be used with on-line replanning, and is also well-suited for on-line PI algorithms with value and policy approximations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Optimization and Search Problems