Simpler near-optimal controllers through direct supervision
Douglas Tweed

TL;DR
This paper introduces a direct method for learning the gradient of the cost-to-go function to create simpler, near-optimal controllers more efficiently than traditional GHJB methods, demonstrated on test problems.
Contribution
It proposes a novel approach to directly learn the gradient of the cost-to-go function, simplifying controller design and reducing costs compared to GHJB.
Findings
Direct gradient learning yields simpler controllers
Method outperforms GHJB on test problems
Reduces complexity and cost of near-optimal control
Abstract
The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a powerful way of creating near-optimal controllers by learning. It is based on the fact that if we have a feedback controller, and we learn to compute the gradient grad-J of its cost-to-go function, then we can use that gradient to define a better controller. We can then use the new controller's grad-J to define a still-better controller, and so on. Here I point out that GHJB works indirectly in the sense that it doesn't learn the best approximation to grad-J but instead learns the time derivative dJ/dt, and infers grad-J from that. I show that we can get simpler and lower-cost controllers by learning grad-J directly. To do this, we need teaching signals that report grad-J(x) for a varied set of states x. I show how to obtain these signals, using the GHJB equation to calculate one component of grad-J(x) -- the one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Advanced Control Systems Optimization · Control Systems and Identification
