Geometric Exploration for Online Control
Orestis Plevrakis, Elad Hazan

TL;DR
This paper introduces new polynomial-time algorithms for controlling unknown linear dynamical systems with convex costs, achieving optimal regret bounds through geometric exploration methods, applicable to both full-information and bandit feedback settings.
Contribution
It presents the first polynomial-time algorithms with optimal regret bounds for online control of unknown linear systems, using a novel geometric exploration approach.
Findings
Achieved $n^3 ext{ } ext{ } oot T$-regret for known costs.
Developed polynomial-time algorithm with $ ext{poly}(n) oot T$-regret for bandit feedback.
Improved regret bounds from previous $T^{2/3}$ to $ oot T$.
Abstract
We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs. the class of disturbance-feedback-controllers, which encompasses all stabilizing linear-dynamical-controllers. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with -regret, where is the dimension of the state plus the dimension of control input. The -horizon dependence is optimal, and improves upon the previous best known bound of . The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in the policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with -regret, building on Stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Control Systems Optimization · Advanced Optimization Algorithms Research
