Logarithmic Regret for Adversarial Online Control

Dylan J. Foster; Max Simchowitz

arXiv:2003.00189·cs.LG·June 24, 2020·22 cites

Logarithmic Regret for Adversarial Online Control

Dylan J. Foster, Max Simchowitz

PDF

Open Access 1 Video

TL;DR

This paper presents a novel online control algorithm achieving logarithmic regret against adversarial disturbances in linear-quadratic systems, a significant improvement over previous methods with rac{1}{2} regret bounds.

Contribution

It introduces the first algorithm with logarithmic regret for adversarial disturbances in known linear-quadratic control systems, using a new characterization of the optimal offline control law.

Findings

01

Achieves logarithmic regret in adversarial online control.

02

Reduces control problem to online learning with advantage functions.

03

Does not require control movement costs for the iterates.

Abstract

We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances. Existing regret bounds for this setting scale as $T$ unless strong stochastic assumptions are imposed on the disturbance process. We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences, provided the state and control costs are given by known quadratic functions. Our algorithm and analysis use a characterization for the optimal offline control law to reduce the online control problem to (delayed) online learning with approximate advantage functions. Compared to previous techniques, our approach does not need to control movement costs for the iterates, leading to logarithmic regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Logarithmic Regret for Adversarial Online Control· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference