Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems
Yilie Huang, Yanwei Jia, Xun Yu Zhou

TL;DR
This paper introduces a model-free reinforcement learning algorithm for continuous-time linear-quadratic control problems with state-dependent volatility, achieving sublinear regret bounds and validated through simulations.
Contribution
It develops a novel RL algorithm with an exploration schedule for continuous-time LQ problems, providing theoretical regret bounds and empirical validation.
Findings
Achieves a regret bound of O(N^{3/4}) up to logarithmic factors.
Demonstrates better regret performance than recent model-based methods.
Validates theoretical results through simulation studies.
Abstract
We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of up to a logarithmic factor, where is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Iterative Learning Control Systems
