Model Based Reinforcement Learning with Final Time Horizon Optimization
Wei Sun, Evangelos Theodorou, Panagiotis Tsiotras

TL;DR
This paper introduces a novel model-based reinforcement learning algorithm that optimizes the control policy and final time horizon simultaneously, grounded in optimal control theory and dynamic programming, with proven optimality in linear cases and applications to nonlinear systems.
Contribution
It develops a new algorithm for trajectory optimization with free final time horizon, extending previous methods and demonstrating optimality in linear systems.
Findings
Recovers the theoretical optimal solution on linear problems
Generalizes previous trajectory optimization results
Successfully applied to nonlinear systems
Abstract
We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous results in model based trajectory optimization. Our analysis shows that the proposed algorithm recovers the theoretical optimal solution on linear low dimensional problem. Finally we provide application results on nonlinear systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
