On the Peril of (Even a Little) Nonstationarity in Satisficing Regret Minimization
Yixuan Zhang, Ruihao Zhu, Qiaomin Xie

TL;DR
This paper investigates the impact of even minimal nonstationarity on satisficing regret in multi-armed bandits, revealing that regret necessarily scales with time in nonstationary environments.
Contribution
It introduces a novel Fano-based analytical framework for nonstationary bandits and characterizes the regret scaling with the number of stationary segments.
Findings
Optimal regret scales as Θ(L log T) with L stationary segments.
In stationary case (L=1), constant satisficing regret is achievable.
Even slight nonstationarity causes regret to grow with T.
Abstract
Motivated by the principle of satisficing in decision-making, we study satisficing regret guarantees for nonstationary -armed bandits. We show that in the general realizable, piecewise-stationary setting with stationary segments, the optimal regret is as long as . This stands in sharp contrast to the case of (i.e., the stationary setting), where a -independent satisficing regret is achievable under realizability. In other words, the optimal regret has to scale with even if just a little nonstationarity presents. A key ingredient in our analysis is a novel Fano-based framework tailored to nonstationary bandits via a \emph{post-interaction reference} construction. This framework strictly extends the classical Fano method for passive estimation as well as recent interactive Fano techniques for stationary bandits. As a complement,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
