COMPASS-Hedge: Learning Safely Without Knowing the World
Ting Hu, Luanda Cai, Manolis Vlatakis

TL;DR
COMPASS-Hedge is a novel full-information online learning algorithm that unifies adversarial, stochastic, and baseline safety guarantees without prior environment knowledge.
Contribution
It introduces the first full-information method achieving optimal regret in adversarial, stochastic, and baseline safety regimes simultaneously, without requiring problem-dependent parameters.
Findings
Achieves minimax-optimal regret in adversarial settings.
Attains instance-optimal, gap-dependent regret in stochastic environments.
Ensures near-constant regret relative to a baseline policy.
Abstract
Online learning algorithms often faces a fundamental trilemma: balancing regret guarantees between adversarial and stochastic settings and providing baseline safety against a fixed comparator. While existing methods excel in one or two of these regimes, they typically fail to unify all three without sacrificing optimal rates or requiring oracle access to problem-dependent parameters. In this work, we bridge this gap by introducing COMPASS-Hedge. Our algorithm is the first full-information method to simultaneously achieve: i) Minimax-optimal regret in adversarial environments; ii) Instance-optimal, gap-dependent regret in stochastic environments; and iii) regret relative to a designated baseline policy, up to logarithmic factors. Crucially, COMPASS-Hedge is parameter-free and requires no prior knowledge of the environment's nature or the magnitude of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
