Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via   Contractive Perturbations

Yiheng Lin; James A. Preiss; Emile Anand; Yingying Li; Yisong Yue,; Adam Wierman

arXiv:2210.12320·math.OC·June 14, 2023·NeurIPS

Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations

Yiheng Lin, James A. Preiss, Emile Anand, Yingying Li, Yisong Yue,, Adam Wierman

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces GAPS, an online adaptive policy selection algorithm for time-varying systems, achieving optimal regret in convex cases and fast adaptation in non-convex scenarios, with strong theoretical guarantees and empirical performance.

Contribution

The paper presents GAPS, a novel online policy selection algorithm with a general analytical framework, achieving optimal regret in convex settings and first local regret bounds in non-convex cases.

Findings

01

GAPS achieves optimal policy regret in convex environments.

02

GAPS adapts more quickly than benchmarks in changing conditions.

03

Theoretical guarantees include first local regret bounds for non-convex cases.

Abstract

We study online adaptive policy selection in systems with time-varying costs and dynamics. We develop the Gradient-based Adaptive Policy Selection (GAPS) algorithm together with a general analytical framework for online policy selection via online optimization. Under our proposed notion of contractive policy classes, we show that GAPS approximates the behavior of an ideal online gradient descent algorithm on the policy parameters while requiring less information and computation. When convexity holds, our algorithm is the first to achieve optimal policy regret. When convexity does not hold, we provide the first local regret bound for online policy selection. Our numerical experiments show that GAPS can adapt to changing environments more quickly than existing benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jpreiss/adaptive_policy_selection
none

Videos

Online Adaptive Policy Selection in Time-Varying Systems: No-Regret via Contractive Perturbations· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems