Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems
Yilie Huang, Xun Yu Zhou

TL;DR
This paper introduces an adaptive, data-driven exploration method for continuous-time stochastic LQ control in reinforcement learning, improving efficiency and regret bounds over fixed exploration strategies.
Contribution
It proposes a novel adaptive exploration mechanism that adjusts entropy regularization and policy variance, enhancing learning efficiency in continuous-time LQ RL problems.
Findings
Achieves sublinear regret bounds matching the best-known results.
Accelerates convergence compared to non-adaptive methods.
Improves regret performance over model-based approaches.
Abstract
We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in \cite{huang2024sublinear}, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Extremum Seeking Control Systems · Adaptive Dynamic Programming Control
