A Theoretical Difficulty in Approximate Dynamic Programming with Input   Constraints

Xuefeng Bao; Zhi-Hong Mao; and Nitin Sharma

arXiv:1805.06424·math.OC·May 24, 2018

A Theoretical Difficulty in Approximate Dynamic Programming with Input Constraints

Xuefeng Bao, Zhi-Hong Mao, and Nitin Sharma

PDF

Open Access

TL;DR

This paper investigates a fundamental theoretical challenge in approximate dynamic programming with input constraints, highlighting issues in policy iteration convergence due to trajectory evaluation problems.

Contribution

It identifies a key theoretical difficulty in the convergence of constrained ADP policy iteration algorithms caused by trajectory evaluation issues.

Findings

01

Convergence issues arise in constrained ADP policy iteration.

02

The difficulty is linked to evaluating the same trajectory after policy updates.

03

This problem impacts the reliability of ADP in systems with actuator constraints.

Abstract

Equipping approximate dynamic programming (ADP) with inputconstraints has a tremendous significance. This enables ADP to be applied tothe systems with actuator limitations, which is quite common for dynamicalsystems. In a conventional constrained ADP framework, the optimal control issearched via a policy iteration algorithm, where the value under a constrainedcontrol is solved from a Hamilton-Jacobi-Bellman (HJB) equation while theconstrained control policy is improved based on the current estimated value.This concise and applicable method has been widely-used. However, the con-vergence of the existing policy iteration algorithm may possesses a theoreticaldifficulty, which might be caused by forcibly evaluating the same trajectoryeven though the control policy has already changed. This problem will beexplored in this paper.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Mechanical Circulatory Support Devices · Reinforcement Learning in Robotics