Actor-Critic Learning for Risk-Constrained Linear Quadratic Regulation

Weijian Li; Andreas A. Malikopoulos

arXiv:2510.22267·math.OC·October 28, 2025

Actor-Critic Learning for Risk-Constrained Linear Quadratic Regulation

Weijian Li, Andreas A. Malikopoulos

PDF

TL;DR

This paper introduces a model-free actor-critic algorithm for risk-constrained linear quadratic regulation, enabling online learning with safety constraints in control systems.

Contribution

It formulates the risk-constrained LQR as a max-min problem and develops a multi-time-scale stochastic approximation method for online policy optimization.

Findings

01

Effective policy evaluation via temporal-difference learning

02

Successful policy improvement through gradient-based updates

03

Constraint enforcement through dual variable adaptation

Abstract

In this paper, we investigate the infinite-horizon risk-constrained linear quadratic regulator problem (RC-QR), which augments the classical LQR formulation with a statistical constraint on the variability of the system state to incorporate risk awareness, a key requirement in safety-critical control applications. We propose an actor-critic learning algorithm that jointly performs policy evaluation and policy improvement in a model-free and online manner. The RC-QR problem is first reformulated as a max-min optimization problem, from which we develop a multi-time-scale stochastic approximation scheme. The critic employs temporal-difference learning to estimate the action-value function, the actor updates the policy parameters via a policy gradient step, and the dual variable is adapted through gradient ascent to enforce the risk constraint.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.