Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning
Tongxi Wang, Zhuoyang Xia, Xinran Chen, Shan Liu

TL;DR
This paper introduces AES, an adaptive entropy scheduling method for non-stationary reinforcement learning that dynamically adjusts exploration based on environment drift, improving stability and recovery.
Contribution
It formulates entropy scheduling as a dynamic-regret trade-off and proposes AES, a minimally invasive method that adapts entropy coefficients online using observable drift proxies.
Findings
AES reduces performance degradation caused by environment drift.
AES accelerates recovery after abrupt changes in the environment.
The method is effective across multiple algorithms, tasks, and drift modes.
Abstract
Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift, and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We show that, under standard assumptions, entropy scheduling in non-stationary maximum-entropy RL can be cast as the dynamic-regret trade-off between tracking a drifting comparator and stabilizing updates, yielding a square-root scaling rule for the entropy weight in terms of a online non-stationarity proxy. Building on this, we propose AES--Adaptive Entropy Scheduling--which adaptively adjusts the entropy coefficient/temperature online using observable drift proxies during training, requiring almost no structural changes and incurring minimal overhead.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research
