Gaining efficiency in deep policy gradient method for continuous-time optimal control problems
Arash Fahim, Md. Arafatur Rahman

TL;DR
This paper introduces an efficient multi-scale deep policy gradient method for continuous-time optimal control, optimizing resource allocation and neural network complexity, demonstrated on linear-quadratic problems.
Contribution
It presents a novel multi-scale approach that manages computational resources and neural network complexity for continuous-time control problems.
Findings
Effective resource allocation improves training efficiency.
Method achieves accurate policies on linear-quadratic control.
Theoretical results guide optimal resource distribution.
Abstract
In this paper, we propose an efficient implementation of deep policy gradient method (PGM) for optimal control problems in continuous time. The proposed method has the ability to manage the allocation of computational resources, number of trajectories, and complexity of architecture of the neural network. This is, in particular, important for continuous-time problems that require a fine time discretization. Each step of this method focuses on a different time scale and learns a policy, modeled by a neural network, for a discretized optimal control problem. The first step has the coarsest time discretization. As we proceed to other steps, the time discretization becomes finer. The optimal trained policy in each step is also used to provide data for the next step. We accompany the multi-scale deep PGM with a theoretical result on allocation of computational resources to obtain a targeted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Machine Learning and ELM
