DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandits

Argyrios Gerogiannis; Yu-Han Huang; Subhonmesh Bose; Venugopal V. Veeravalli

arXiv:2501.19401·cs.LG·September 30, 2025

DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandits

Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose, Venugopal V. Veeravalli

PDF

Open Access 3 Reviews

TL;DR

DAL is a versatile, prior-free framework that enhances stationary bandit algorithms with change detection to effectively handle non-stationary environments, demonstrated through extensive experiments and theoretical insights.

Contribution

Introduces DAL, a practical black-box framework that augments any stationary bandit algorithm with change detection for non-stationary bandit problems without prior knowledge.

Findings

01

DAL outperforms state-of-the-art methods in diverse scenarios

02

It is applicable to all common bandit variants

03

Demonstrates strong empirical and theoretical performance

Abstract

We introduce a practical, black-box framework termed Detection Augmented Learning (DAL) for the problem of non-stationary bandits without prior knowledge of the underlying non-stationarity. DAL accepts any stationary bandit algorithm as input and augments it with a change detector, enabling applicability to all common bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses current state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL's strong empirical performance, complemented by thorough experimental validation.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. Many related works are discussed. 2. Numerical experiments are done in various datasets.

Weaknesses

This work presents a set of numerical results and a set of analytical results while neither of them fully convince me the superiority of the algorithm. I wonder what is the key contribution/focus of the work. Some key concerns are as below: 1. Abstract: It is claimed that 'DAL accepts any stationary bandit algorithm as input' while Propositions/theorems (e.g. Theorem 4.4) come with some assumptions/conditions. It is somehow confusing. 1. Line 28: It is claimed that 'MABs fall into ... PB, NPB, C

Reviewer 02Rating 4Confidence 3

Strengths

Strengths: this is a nice problem, and one that has been considered by many authors over the years. The approach, while fairly simple, is effective. The experiments seem to be justifiable and demonstrate the performance of the method.

Weaknesses

Weaknesses: the paper is not so easy to digest and understand at times. The tuning of the methods seems challenging, and the authors do not convince the reader otherwise. No details on the construction of the covering set are provided, as an instance. Questions: what if the process contains a mix of abrupt and gradual changes? Can this method be augmented with memory, allowing to go back to previous regimes, instead of effectively starting from scratch every time?

Reviewer 03Rating 4Confidence 3

Strengths

1. The method provides an algorithm with theoretical guarantees that does not rely on prior knowledge of the environment, and it also shows strong empirical performance. 2. The method is general: it acts as a black-box change detector that can be wrapped around different types of bandit algorithms, and it works across multiple bandit settings.

Weaknesses

1. The method does not provide theoretical guarantee for the drifting case. This is expected, because the change-detection mechanism is designed for abrupt changes, not for drifting changes. The paper only shows empirical performance on drifting, but bandits are primarily a theoretical setting, so having a matching optimal regret guarantee there is important and is currently missing. 2. Compared to MASTER, this paper’s analysis in the piecewise-stationary setting relies on an extra assumption:

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques