Logarithmic Regret for Adversarial Online Control
Dylan J. Foster, Max Simchowitz

TL;DR
This paper presents a novel online control algorithm achieving logarithmic regret against adversarial disturbances in linear-quadratic systems, a significant improvement over previous methods with rac{1}{2} regret bounds.
Contribution
It introduces the first algorithm with logarithmic regret for adversarial disturbances in known linear-quadratic control systems, using a new characterization of the optimal offline control law.
Findings
Achieves logarithmic regret in adversarial online control.
Reduces control problem to online learning with advantage functions.
Does not require control movement costs for the iterates.
Abstract
We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances. Existing regret bounds for this setting scale as unless strong stochastic assumptions are imposed on the disturbance process. We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences, provided the state and control costs are given by known quadratic functions. Our algorithm and analysis use a characterization for the optimal offline control law to reduce the online control problem to (delayed) online learning with approximate advantage functions. Compared to previous techniques, our approach does not need to control movement costs for the iterates, leading to logarithmic regret.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
