DARLING: Detection Augmented Reinforcement Learning with Non-Stationary Guarantees
Argyrios Gerogiannis, Yu-Han Huang, Venugopal V. Veeravalli

TL;DR
DARLING is a modular reinforcement learning framework that adapts to non-stationary environments without prior knowledge, improving theoretical guarantees and outperforming existing methods in diverse benchmarks.
Contribution
It introduces DARLING, the first approach with minimax optimal guarantees for non-stationary RL in tabular and linear MDPs, with extensive empirical validation.
Findings
DARLING improves dynamic regret bounds in tabular MDPs under certain conditions.
It matches minimax lower bounds in linear MDPs when reachability parameters are known.
DARLING outperforms state-of-the-art methods across various non-stationary benchmarks.
Abstract
We study model-free reinforcement learning (RL) in non-stationary finite-horizon episodic Markov decision processes (MDPs) without prior knowledge of the non-stationarity. We focus on the piecewise stationary (PS) setting, where both rewards and transition dynamics can change at unknown times. We first revisit existing state-of-the-art approaches and identify theoretical and practical limitations that change the current landscape of performance guarantees. To characterize the difficulty of the problem, we establish the first minimax lower bounds for PS-RL in tabular and linear MDPs. We then introduce Detection Augmented Reinforcement Learning (DARLING), a modular wrapper for PS-RL that applies to both tabular and linear MDPs, without knowledge of the changes. In tabular MDPs, under change-point separability and reachability conditions, DARLING improves the best known dynamic regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
