Regret-Optimal Control for Finite-State Systems
Yishay Polatov, Oron Sabag

TL;DR
This paper develops a novel control framework for finite-state systems that minimizes dynamic regret relative to a lookahead benchmark, offering a flexible alternative to traditional MDP and robust control methods.
Contribution
It introduces a nested dynamic programming approach and the Regret-Bellman operator to compute regret-optimal policies without needing disturbance distribution knowledge.
Findings
Regret-optimal policies interpolate between MDP and robust controllers.
These policies outperform classical methods under various disturbance conditions.
The approach provides a new way to handle exogenous disturbances in finite-state systems.
Abstract
We study the control of finite-state systems driven by exogenous disturbances, and design causal policies that track the performance of a lookahead benchmark controller. This objective is formalized through dynamic regret, so that favorable disturbance sequences are compared against a strong benchmark, while under adverse disturbance sequences the comparison accounts for the benchmark's degraded performance. This benchmark-relative framework provides an alternative to classical MDP formulations, which assume i.i.d. disturbances, and to robust control approaches, which optimize against worst-case disturbances. Our main result is a nested dynamic-programming solution that computes both the optimal worst-case regret and a regret-optimal policy. In particular, we introduce the Regret-Bellman operator, whose fixed-point value function feeds into a finite-horizon dynamic program. Numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
