Learning-based Control of Unknown Linear Systems with Thompson Sampling
Yi Ouyang, Mukul Gagrani, Rahul Jain

TL;DR
This paper introduces TSDE, a Thompson sampling algorithm for unknown linear systems, achieving a near-optimal regret bound of O(√T) and demonstrating robustness to parameter drift through reinitialization.
Contribution
The paper presents the first O(√T) regret bound for learning in LQ control using Thompson sampling with dynamic episodes and reinitialization for drifting parameters.
Findings
Achieves O(√T) Bayesian regret bound.
Demonstrates robustness to time-varying model parameters.
Provides numerical simulations validating the approach.
Abstract
We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the lengths of the dynamic episodes in Thompson sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by O(\sqrt{T}). Here O(.) hides constants and logarithmic factors. This is the first O(\sqrt{T} ) bound on expected regret of learning in LQ control. By introducing a reinitialization schedule, we also show that the algorithm is robust to time-varying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
