Simpler near-optimal controllers through direct supervision

Douglas Tweed

arXiv:0908.2859·math.OC·August 21, 2009

Simpler near-optimal controllers through direct supervision

Douglas Tweed

PDF

Open Access

TL;DR

This paper introduces a direct method for learning the gradient of the cost-to-go function to create simpler, near-optimal controllers more efficiently than traditional GHJB methods, demonstrated on test problems.

Contribution

It proposes a novel approach to directly learn the gradient of the cost-to-go function, simplifying controller design and reducing costs compared to GHJB.

Findings

01

Direct gradient learning yields simpler controllers

02

Method outperforms GHJB on test problems

03

Reduces complexity and cost of near-optimal control

Abstract

The method of generalized Hamilton-Jacobi-Bellman equations (GHJB) is a powerful way of creating near-optimal controllers by learning. It is based on the fact that if we have a feedback controller, and we learn to compute the gradient grad-J of its cost-to-go function, then we can use that gradient to define a better controller. We can then use the new controller's grad-J to define a still-better controller, and so on. Here I point out that GHJB works indirectly in the sense that it doesn't learn the best approximation to grad-J but instead learns the time derivative dJ/dt, and infers grad-J from that. I show that we can get simpler and lower-cost controllers by learning grad-J directly. To do this, we need teaching signals that report grad-J(x) for a varied set of states x. I show how to obtain these signals, using the GHJB equation to calculate one component of grad-J(x) -- the one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Advanced Control Systems Optimization · Control Systems and Identification