Learning-based Control of Unknown Linear Systems with Thompson Sampling

Yi Ouyang; Mukul Gagrani; Rahul Jain

arXiv:1709.04047·cs.SY·September 14, 2017·39 cites

Learning-based Control of Unknown Linear Systems with Thompson Sampling

Yi Ouyang, Mukul Gagrani, Rahul Jain

PDF

Open Access

TL;DR

This paper introduces TSDE, a Thompson sampling algorithm for unknown linear systems, achieving a near-optimal regret bound of O(√T) and demonstrating robustness to parameter drift through reinitialization.

Contribution

The paper presents the first O(√T) regret bound for learning in LQ control using Thompson sampling with dynamic episodes and reinitialization for drifting parameters.

Findings

01

Achieves O(√T) Bayesian regret bound.

02

Demonstrates robustness to time-varying model parameters.

03

Provides numerical simulations validating the approach.

Abstract

We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the lengths of the dynamic episodes in Thompson sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by O(\sqrt{T}). Here O(.) hides constants and logarithmic factors. This is the first O(\sqrt{T} ) bound on expected regret of learning in LQ control. By introducing a reinitialization schedule, we also show that the algorithm is robust to time-varying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems