Online Learning for Supervisory Switching Control
Haoyuan Sun, Ali Jadbabaie

TL;DR
This paper introduces a novel non-asymptotic supervisory control method for partially-observed linear systems, using multi-armed bandit algorithms to identify the best controller with finite-time guarantees, even for potentially unstable controllers.
Contribution
It develops a data-driven, control-theoretic bandit approach that provides finite-time performance bounds and stability detection in supervisory control of unknown systems.
Findings
Algorithms identify the best controller in O(N log N) steps
Finite-time guarantees for controller selection and stability detection
Achieves finite L2-gain with respect to disturbances
Abstract
We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy the best controller for the unknown system by periodically selecting among a collection of candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to a control-theoretic setting. The proposed data-driven algorithm evaluates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Model Reduction and Neural Networks · Control Systems and Identification
