Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

Tongxi Wang; Zhuoyang Xia; Xinran Chen; Shan Liu

arXiv:2601.19624·cs.LG·May 19, 2026

Tracking Drift: Variation-Aware Entropy Scheduling for Non-Stationary Reinforcement Learning

Tongxi Wang, Zhuoyang Xia, Xinran Chen, Shan Liu

PDF

TL;DR

This paper introduces AES, an adaptive entropy scheduling method for non-stationary reinforcement learning that dynamically adjusts exploration based on environment drift, improving stability and recovery.

Contribution

It formulates entropy scheduling as a dynamic-regret trade-off and proposes AES, a minimally invasive method that adapts entropy coefficients online using observable drift proxies.

Findings

01

AES reduces performance degradation caused by environment drift.

02

AES accelerates recovery after abrupt changes in the environment.

03

The method is effective across multiple algorithms, tasks, and drift modes.

Abstract

Real-world reinforcement learning often faces environment drift, but most existing methods rely on static entropy coefficients/target entropy, causing over-exploration during stable periods and under-exploration after drift, and leaving unanswered the principled question of how exploration intensity should scale with drift magnitude. We show that, under standard assumptions, entropy scheduling in non-stationary maximum-entropy RL can be cast as the dynamic-regret trade-off between tracking a drifting comparator and stabilizing updates, yielding a square-root scaling rule for the entropy weight in terms of a online non-stationarity proxy. Building on this, we propose AES--Adaptive Entropy Scheduling--which adaptively adjusts the entropy coefficient/temperature online using observable drift proxies during training, requiring almost no structural changes and incurring minimal overhead.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research