DAL: A Practical Prior-Free Black-Box Framework for Non-Stationary Bandits
Argyrios Gerogiannis, Yu-Han Huang, Subhonmesh Bose, Venugopal V. Veeravalli

TL;DR
DAL is a versatile, prior-free framework that enhances stationary bandit algorithms with change detection to effectively handle non-stationary environments, demonstrated through extensive experiments and theoretical insights.
Contribution
Introduces DAL, a practical black-box framework that augments any stationary bandit algorithm with change detection for non-stationary bandit problems without prior knowledge.
Findings
DAL outperforms state-of-the-art methods in diverse scenarios
It is applicable to all common bandit variants
Demonstrates strong empirical and theoretical performance
Abstract
We introduce a practical, black-box framework termed Detection Augmented Learning (DAL) for the problem of non-stationary bandits without prior knowledge of the underlying non-stationarity. DAL accepts any stationary bandit algorithm as input and augments it with a change detector, enabling applicability to all common bandit variants. Extensive experimentation demonstrates that DAL consistently surpasses current state-of-the-art methods across diverse non-stationary scenarios, including synthetic benchmarks and real-world datasets, underscoring its versatility and scalability. We provide theoretical insights into DAL's strong empirical performance, complemented by thorough experimental validation.
Peer Reviews
Decision·Submitted to ICLR 2026
1. Many related works are discussed. 2. Numerical experiments are done in various datasets.
This work presents a set of numerical results and a set of analytical results while neither of them fully convince me the superiority of the algorithm. I wonder what is the key contribution/focus of the work. Some key concerns are as below: 1. Abstract: It is claimed that 'DAL accepts any stationary bandit algorithm as input' while Propositions/theorems (e.g. Theorem 4.4) come with some assumptions/conditions. It is somehow confusing. 1. Line 28: It is claimed that 'MABs fall into ... PB, NPB, C
Strengths: this is a nice problem, and one that has been considered by many authors over the years. The approach, while fairly simple, is effective. The experiments seem to be justifiable and demonstrate the performance of the method.
Weaknesses: the paper is not so easy to digest and understand at times. The tuning of the methods seems challenging, and the authors do not convince the reader otherwise. No details on the construction of the covering set are provided, as an instance. Questions: what if the process contains a mix of abrupt and gradual changes? Can this method be augmented with memory, allowing to go back to previous regimes, instead of effectively starting from scratch every time?
1. The method provides an algorithm with theoretical guarantees that does not rely on prior knowledge of the environment, and it also shows strong empirical performance. 2. The method is general: it acts as a black-box change detector that can be wrapped around different types of bandit algorithms, and it works across multiple bandit settings.
1. The method does not provide theoretical guarantee for the drifting case. This is expected, because the change-detection mechanism is designed for abrupt changes, not for drifting changes. The paper only shows empirical performance on drifting, but bandits are primarily a theoretical setting, so having a matching optimal regret guarantee there is important and is currently missing. 2. Compared to MASTER, this paper’s analysis in the piecewise-stationary setting relies on an extra assumption:
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques
