Actor-Critic Learning for Risk-Constrained Linear Quadratic Regulation
Weijian Li, Andreas A. Malikopoulos

TL;DR
This paper introduces a model-free actor-critic algorithm for risk-constrained linear quadratic regulation, enabling online learning with safety constraints in control systems.
Contribution
It formulates the risk-constrained LQR as a max-min problem and develops a multi-time-scale stochastic approximation method for online policy optimization.
Findings
Effective policy evaluation via temporal-difference learning
Successful policy improvement through gradient-based updates
Constraint enforcement through dual variable adaptation
Abstract
In this paper, we investigate the infinite-horizon risk-constrained linear quadratic regulator problem (RC-QR), which augments the classical LQR formulation with a statistical constraint on the variability of the system state to incorporate risk awareness, a key requirement in safety-critical control applications. We propose an actor-critic learning algorithm that jointly performs policy evaluation and policy improvement in a model-free and online manner. The RC-QR problem is first reformulated as a max-min optimization problem, from which we develop a multi-time-scale stochastic approximation scheme. The critic employs temporal-difference learning to estimate the action-value function, the actor updates the policy parameters via a policy gradient step, and the dual variable is adapted through gradient ascent to enforce the risk constraint.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
