Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations
Yiheng Lin, James A. Preiss, Emile Anand, Yingying Li, Yisong Yue,, Adam Wierman

TL;DR
This paper introduces GAPS, an online adaptive policy selection algorithm for time-varying systems, achieving optimal regret in convex cases and fast adaptation in non-convex scenarios, with strong theoretical guarantees and empirical performance.
Contribution
The paper presents GAPS, a novel online policy selection algorithm with a general analytical framework, achieving optimal regret in convex settings and first local regret bounds in non-convex cases.
Findings
GAPS achieves optimal policy regret in convex environments.
GAPS adapts more quickly than benchmarks in changing conditions.
Theoretical guarantees include first local regret bounds for non-convex cases.
Abstract
We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent algorithm on the policy parameters while requiring less information and computation. When convexity holds, our algorithm is the first to achieve optimal policy regret. When convexity does not hold, we provide the first local regret bound for online policy selection. Our numerical experiments show that GAPS can adapt to changing environments more quickly than existing benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems
