Scalable regret for learning to control network-coupled subsystems with   unknown dynamics

Sagar Sudhakara; Aditya Mahajan; Ashutosh Nayyar; Yi Ouyang

arXiv:2108.07970·eess.SY·August 19, 2021·1 cites

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

PDF

Open Access

TL;DR

This paper introduces a scalable Thompson sampling algorithm for controlling network-coupled linear quadratic Gaussian systems with unknown dynamics, achieving regret that scales linearly with the number of subsystems.

Contribution

It proposes a novel network-structure-exploiting learning algorithm with regret bounds that grow linearly with subsystems, improving over existing super-linear regret methods.

Findings

01

Regret bounded by (n ; T) for the proposed algorithm

02

Regret scales linearly with the number of subsystems

03

Numerical experiments confirm theoretical results

Abstract

We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $\tilde{O} (n T)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $\tilde{O} (\cdot)$ notation hides logarithmic terms in $n$ and $T$ . Thus, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Influenza Virus Research Studies