Online Abstract Dynamic Programming with Contractive Models
Xiuxian Li, Lihua Xie

TL;DR
This paper develops and analyzes online dynamic programming algorithms for time-varying abstract models, providing theoretical bounds on their tracking errors and demonstrating their effectiveness through examples.
Contribution
It introduces and analyzes several online DP algorithms for time-varying models, with new theoretical error bounds based on the variation of the mappings.
Findings
Tracking error bounds depend on the largest difference between consecutive mappings.
The algorithms effectively track time-varying optimal costs and policies.
Examples validate the theoretical error bounds and algorithm performance.
Abstract
This paper addresses the abstract dynamic programming (DP) in the online scenario, where the abstract DP mapping is time-varying, instead of static. In this case, optimal costs and policies at different time instants are not the same in general, and the problem amounts to tracking time-varying optimal costs and policies, which is of interest to many practical problems. It is thus necessary to analyze the performance of classical value iteration (VI) and policy iteration (PI) algorithms in the online case. In doing so, this paper develops and provides the theoretical analysis for several online algorithms, including approximate online VI, online PI, approximate online PI, online optimistic PI, approximate online optimistic PI, and asynchronous online PI and VI algorithms. It is proved that the tracking error bounds for all algorithms critically depend upon the largest difference between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Smart Grid Energy Management
