Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error

Armin Gie{\ss}ler; Albertus Johannes Malan; S\"oren Hohmann

arXiv:2506.09685·eess.SY·April 17, 2026

Bridging Continuous-time LQR and Reinforcement Learning via Gradient Flow of the Bellman Error

Armin Gie{\ss}ler, Albertus Johannes Malan, S\"oren Hohmann

PDF

TL;DR

This paper introduces a continuous-time Bellman error approach for solving the infinite-horizon LQR problem, establishing a connection with reinforcement learning through gradient flow analysis.

Contribution

It develops a novel gradient flow method based on a continuous-time Bellman error, providing a new perspective linking LQR and reinforcement learning.

Findings

01

Gradient flow converges to the optimal feedback gain.

02

Unique stabilizing feedback trajectory is generated.

03

Method outperforms existing approaches in simulations.

Abstract

In this paper, we present a novel method for computing the optimal feedback gain of the infinite-horizon Linear Quadratic Regulator (LQR) problem via an ordinary differential equation. We introduce a novel continuous-time Bellman error, derived from the Hamilton-Jacobi-Bellman (HJB) equation, which quantifies the suboptimality of stabilizing policies and is parametrized in terms of the feedback gain. We analyze its properties, including its effective domain, smoothness, coerciveness and show the existence of a unique stationary point within the stability region. Furthermore, we derive a closed-form gradient expression of the Bellman error that induces a gradient flow. This converges to the optimal feedback and generates a unique trajectory which exclusively comprises stabilizing feedback policies. Additionally, this work advances interesting connections between LQR theory and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.