Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear   Quadratic Control

Taylan Kargin; Sahin Lale; Kamyar Azizzadenesheli; Anima Anandkumar,; Babak Hassibi

arXiv:2206.08520·cs.LG·June 20, 2022·1 cites

Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear Quadratic Control

Taylan Kargin, Sahin Lale, Kamyar Azizzadenesheli, Anima Anandkumar,, Babak Hassibi

PDF

Open Access

TL;DR

This paper introduces an efficient Thompson Sampling algorithm for adaptive control of unknown linear-quadratic regulators, achieving near-optimal regret bounds without requiring prior stabilizing controllers.

Contribution

The paper presents TSAC, a novel Thompson Sampling-based method that attains $ ilde O( ext{sqrt}(T))$ regret for multidimensional LQRs without prior stabilizer, solving a key open problem.

Findings

01

TSAC achieves order-optimal regret in multidimensional LQRs.

02

The algorithm stabilizes systems quickly through effective exploration.

03

Empirical results demonstrate TSAC's efficiency and performance.

Abstract

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that $\tilde{O} (T)$ frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains $\tilde{O} (T)$ regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research

MethodsSpatio-temporal stability analysis