Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic Control
Taylan Kargin, Sahin Lale, Kamyar Azizzadenesheli, Anima Anandkumar,, Babak Hassibi

TL;DR
This paper introduces an efficient Thompson Sampling algorithm for adaptive control of unknown linear-quadratic regulators, achieving near-optimal regret bounds without requiring prior stabilizing controllers.
Contribution
The paper presents TSAC, a novel Thompson Sampling-based method that attains $ ilde O( ext{sqrt}(T))$ regret for multidimensional LQRs without prior stabilizer, solving a key open problem.
Findings
TSAC achieves order-optimal regret in multidimensional LQRs.
The algorithm stabilizes systems quickly through effective exploration.
Empirical results demonstrate TSAC's efficiency and performance.
Abstract
Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
MethodsSpatio-temporal stability analysis
