Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems

Yilie Huang; Xun Yu Zhou

arXiv:2507.00358·cs.LG·July 24, 2025

Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems

Yilie Huang, Xun Yu Zhou

PDF

Open Access

TL;DR

This paper introduces an adaptive, data-driven exploration method for continuous-time stochastic LQ control in reinforcement learning, improving efficiency and regret bounds over fixed exploration strategies.

Contribution

It proposes a novel adaptive exploration mechanism that adjusts entropy regularization and policy variance, enhancing learning efficiency in continuous-time LQ RL problems.

Findings

01

Achieves sublinear regret bounds matching the best-known results.

02

Accelerates convergence compared to non-adaptive methods.

03

Improves regret performance over model-based approaches.

Abstract

We study reinforcement learning (RL) for the same class of continuous-time stochastic linear--quadratic (LQ) control problems as in \cite{huang2024sublinear}, where volatilities depend on both states and controls while states are scalar-valued and running control rewards are absent. We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic and policy variance by the actor. Unlike the constant or deterministic exploration schedules employed in \cite{huang2024sublinear}, which require extensive tuning for implementations and ignore learning progresses during iterations, our adaptive exploratory approach boosts learning efficiency with minimal tuning. Despite its flexibility, our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems, which were previously derived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Extremum Seeking Control Systems · Adaptive Dynamic Programming Control