Tempo Adaptation in Non-stationary Reinforcement Learning
Hyunin Lee, Yuhao Ding, Jongmin Lee, Ming Jin, Javad Lavaei, Somayeh, Sojoudi

TL;DR
This paper addresses the challenge of time synchronization in non-stationary reinforcement learning by proposing a framework that optimally schedules interaction times to improve policy performance in changing environments.
Contribution
It introduces the ProST framework that computes an optimal sequence of interaction times to balance training and environmental change, reducing regret in non-stationary RL.
Findings
ProST outperforms existing methods in high-dimensional non-stationary environments.
Theoretical analysis shows sublinear dynamic regret with ProST.
Optimal scheduling improves online returns in experiments.
Abstract
We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time () rather than episode progress (), where wall-clock time signifies the actual elapsed time within the fixed duration . In existing works, at episode , the agent rolls a trajectory and trains a policy before transitioning to episode . In the context of the time-desynchronized environment, however, the agent at time allocates for trajectory generation and training, subsequently moves to the next episode at . Despite a fixed total number of episodes (), the agent accumulates different trajectories influenced by the choice of interaction times…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Reservoir Computing · Reinforcement Learning in Robotics · Smart Grid Energy Management
