Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

Wenhao Xu; Xuefeng Gao; Xuedong He

arXiv:2406.05366·cs.LG·February 14, 2025·1 cites

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

Wenhao Xu, Xuefeng Gao, Xuedong He

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper develops regret bounds for online adaptive control of risk-sensitive linear quadratic regulators in episodic settings, introducing algorithms with logarithmic and square-root regret guarantees under different assumptions.

Contribution

It provides the first regret bounds for episodic risk-sensitive LQR, using novel analysis of Riccati equations and performance loss in online learning.

Findings

01

Achieves $ ilde{O}( ext{log } N)$ regret under identifiability

02

Proposes exploration noise to attain $ ilde{O}( ext{sqrt } N)$ regret without identifiability

03

First regret bounds established for risk-sensitive LQR in episodic control

Abstract

Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $O (lo g N)$ regret under a specific identifiability assumption, where $N$ is the total number of episodes. If the identifiability assumption is not satisfied, we propose incorporating exploration noise into the least-squares-based algorithm, resulting in an algorithm with $O (N)$ regret. To our best knowledge, this is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator. Our proof relies on perturbation analysis of less-standard Riccati equations for risk-sensitive linear quadratic…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. This paper initiate the consideration of risk-sensitive formulation of LQR in an online episodic setting. 2. The paper rigorously characterize the regret bounds of the proposed algorithms.

Weaknesses

1. Although the authors mentioned a previous work Basei et al. 2022 that considers an episodic control problem, it would be good to further explain why it is important/of practical interest to consider the episodic setting in the control problem. In addition, it would be good to discuss what the major challenge is if we move to the non-episodic setting which is the most standard setting in control problems (i.e., the system involves continuously and does not reset). 2. It would be good to explai

Reviewer 02Rating 8Confidence 3

Strengths

**The following are the strengths of the paper:** 1. This paper is the first to provide regret bounds for the episodic finite-horizon LEQR problem, which has applications in risk-sensitive control problems in areas like finance and healthcare. 2. The authors proposed two algorithms with sub-linear regret bounds guarantees. The regret bounds are derived using perturbation analysis of modified Riccati equations, which incorporate exponential risk-sensitive cost (defined in Eq. 2). 3. Finally, th

Weaknesses

**The following are the weaknesses of the paper:** 1. Since verifying the identifiability assumption (Assumption 1) for a given problem may not be possible, the first algorithm may not be useful in practice. 2. Both proposed algorithms are restricted to fixed finite-horizon settings and linear dynamics, which limits their real-world application, where horizon length can vary across episodes and problems with non-linear dynamics. 3. The following parts of the paper are not clear enough: -

Reviewer 03Rating 6Confidence 3

Strengths

The paper is well-written and easy to follow. All the assumptions are well-motivated and standard in the literature. The proposed algorithms are simple and can be easily implemented by practitioners. ----

Weaknesses

*1.* I think the authors should provide an explicit dependence of gamma in the two theorems, i.e., the big O should contain gamma orders, as gamma is the key in the studied risk-sensitive LQR problems. With gamma dependence in the results, the authors could compare the regret order with risk-neural MDP/LQR problems. Discussions/guidance on gamma selections could be included. *2.* Literature/discussions on risk-averse RL/LQR problems are limited, in particular, the choice of risk-averse metrics

Videos

Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator· slideslive

Taxonomy

TopicsFault Detection and Control Systems · Advanced Control Systems Optimization

MethodsSparse Evolutionary Training