Model Based Reinforcement Learning with Final Time Horizon Optimization

Wei Sun; Evangelos Theodorou; Panagiotis Tsiotras

arXiv:1509.01186·cs.SY·September 4, 2015·2 cites

Model Based Reinforcement Learning with Final Time Horizon Optimization

Wei Sun, Evangelos Theodorou, Panagiotis Tsiotras

PDF

Open Access

TL;DR

This paper introduces a novel model-based reinforcement learning algorithm that optimizes the control policy and final time horizon simultaneously, grounded in optimal control theory and dynamic programming, with proven optimality in linear cases and applications to nonlinear systems.

Contribution

It develops a new algorithm for trajectory optimization with free final time horizon, extending previous methods and demonstrating optimality in linear systems.

Findings

01

Recovers the theoretical optimal solution on linear problems

02

Generalizes previous trajectory optimization results

03

Successfully applied to nonlinear systems

Abstract

We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous results in model based trajectory optimization. Our analysis shows that the proposed algorithm recovers the theoretical optimal solution on linear low dimensional problem. Finally we provide application results on nonlinear systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management