Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Yilie Huang; Yanwei Jia; Xun Yu Zhou

arXiv:2407.17226·cs.LG·July 25, 2025

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

Yilie Huang, Yanwei Jia, Xun Yu Zhou

PDF

Open Access

TL;DR

This paper introduces a model-free reinforcement learning algorithm for continuous-time linear-quadratic control problems with state-dependent volatility, achieving sublinear regret bounds and validated through simulations.

Contribution

It develops a novel RL algorithm with an exploration schedule for continuous-time LQ problems, providing theoretical regret bounds and empirical validation.

Findings

01

Achieves a regret bound of O(N^{3/4}) up to logarithmic factors.

02

Demonstrates better regret performance than recent model-based methods.

03

Validates theoretical results through simulation studies.

Abstract

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O (N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Iterative Learning Control Systems