Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error
Armin Gie{\ss}ler, Albertus Johannes Malan, S\"oren Hohmann

TL;DR
This paper introduces a continuous-time Bellman error approach for solving the infinite-horizon LQR problem, establishing a connection with reinforcement learning through gradient flow analysis.
Contribution
It develops a novel gradient flow method based on a continuous-time Bellman error, providing a new perspective linking LQR and reinforcement learning.
Findings
Gradient flow converges to the optimal feedback gain.
Unique stabilizing feedback trajectory is generated.
Method outperforms existing approaches in simulations.
Abstract
In this paper, we present a novel method for computing the optimal feedback gain of the infinite-horizon Linear Quadratic Regulator (LQR) problem via an ordinary differential equation. We introduce a novel continuous-time Bellman error, derived from the Hamilton-Jacobi-Bellman (HJB) equation, which quantifies the suboptimality of stabilizing policies and is parametrized in terms of the feedback gain. We analyze its properties, including its effective domain, smoothness, coerciveness and show the existence of a unique stationary point within the stability region. Furthermore, we derive a closed-form gradient expression of the Bellman error that induces a gradient flow. This converges to the optimal feedback and generates a unique trajectory which exclusively comprises stabilizing feedback policies. Additionally, this work advances interesting connections between LQR theory and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
