Best of Both Worlds in Online Control: Competitive Ratio and Policy   Regret

Gautam Goel; Naman Agarwal; Karan Singh; Elad Hazan

arXiv:2211.11219·cs.LG·November 22, 2022

Best of Both Worlds in Online Control: Competitive Ratio and Policy Regret

Gautam Goel, Naman Agarwal, Karan Singh, Elad Hazan

PDF

Open Access

TL;DR

This paper bridges regret minimization and competitive analysis in online control of linear systems, showing that certain policies can achieve near-optimal performance in both metrics simultaneously, even without prior system knowledge.

Contribution

It demonstrates that a convex class of disturbance-action policies can approximate the optimal competitive policy, enabling algorithms to attain both sublinear regret and optimal competitive ratio.

Findings

01

Algorithms achieve sublinear regret against the best DAC policy.

02

Algorithms attain near-optimal competitive ratio.

03

Sublinear regret is possible even without prior system stabilization.

Abstract

We consider the fundamental problem of online control of a linear dynamical system from two different viewpoints: regret minimization and competitive analysis. We prove that the optimal competitive policy is well-approximated by a convex parameterized policy class, known as a disturbance-action control (DAC) policies. Using this structural result, we show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight, and optimal competitive ratio, up to an additive correction which grows sublinearly in the time horizon. We further conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown, and even when a stabilizing controller for the dynamics is not available a priori.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management

MethodsDynamic Algorithm Configuration