Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator
Wenhao Xu, Xuefeng Gao, Xuedong He

TL;DR
This paper develops regret bounds for online adaptive control of risk-sensitive linear quadratic regulators in episodic settings, introducing algorithms with logarithmic and square-root regret guarantees under different assumptions.
Contribution
It provides the first regret bounds for episodic risk-sensitive LQR, using novel analysis of Riccati equations and performance loss in online learning.
Findings
Achieves $ ilde{O}( ext{log } N)$ regret under identifiability
Proposes exploration noise to attain $ ilde{O}( ext{sqrt } N)$ regret without identifiability
First regret bounds established for risk-sensitive LQR in episodic control
Abstract
Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves regret under a specific identifiability assumption, where is the total number of episodes. If the identifiability assumption is not satisfied, we propose incorporating exploration noise into the least-squares-based algorithm, resulting in an algorithm with regret. To our best knowledge, this is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator. Our proof relies on perturbation analysis of less-standard Riccati equations for risk-sensitive linear quadratic…
Peer Reviews
Decision·ICLR 2025 Poster
1. This paper initiate the consideration of risk-sensitive formulation of LQR in an online episodic setting. 2. The paper rigorously characterize the regret bounds of the proposed algorithms.
1. Although the authors mentioned a previous work Basei et al. 2022 that considers an episodic control problem, it would be good to further explain why it is important/of practical interest to consider the episodic setting in the control problem. In addition, it would be good to discuss what the major challenge is if we move to the non-episodic setting which is the most standard setting in control problems (i.e., the system involves continuously and does not reset). 2. It would be good to explai
**The following are the strengths of the paper:** 1. This paper is the first to provide regret bounds for the episodic finite-horizon LEQR problem, which has applications in risk-sensitive control problems in areas like finance and healthcare. 2. The authors proposed two algorithms with sub-linear regret bounds guarantees. The regret bounds are derived using perturbation analysis of modified Riccati equations, which incorporate exponential risk-sensitive cost (defined in Eq. 2). 3. Finally, th
**The following are the weaknesses of the paper:** 1. Since verifying the identifiability assumption (Assumption 1) for a given problem may not be possible, the first algorithm may not be useful in practice. 2. Both proposed algorithms are restricted to fixed finite-horizon settings and linear dynamics, which limits their real-world application, where horizon length can vary across episodes and problems with non-linear dynamics. 3. The following parts of the paper are not clear enough: -
The paper is well-written and easy to follow. All the assumptions are well-motivated and standard in the literature. The proposed algorithms are simple and can be easily implemented by practitioners. ----
*1.* I think the authors should provide an explicit dependence of gamma in the two theorems, i.e., the big O should contain gamma orders, as gamma is the key in the studied risk-sensitive LQR problems. With gamma dependence in the results, the authors could compare the regret order with risk-neural MDP/LQR problems. Discussions/guidance on gamma selections could be included. *2.* Literature/discussions on risk-averse RL/LQR problems are limited, in particular, the choice of risk-averse metrics
Videos
Taxonomy
TopicsFault Detection and Control Systems · Advanced Control Systems Optimization
MethodsSparse Evolutionary Training
