Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps
Benjamin Ellis, Matthew T. Jackson, Andrei Lupu, Alexander D. Goldie,, Mattie Fellows, Shimon Whiteson, Jakob Foerster

TL;DR
This paper introduces Adam-Rel, a modified optimizer for reinforcement learning that resets the timestep after target network updates, effectively addressing nonstationarity and improving performance across various RL benchmarks.
Contribution
The paper proposes Adam-Rel, an adaptation of Adam that resets its timestep within epochs to better handle nonstationarity in RL, outperforming standard Adam.
Findings
Adam-Rel reduces large updates caused by nonstationary gradients.
Improved performance of Adam-Rel on Atari and Craftax benchmarks.
Gradient norm increases are observed in RL, validating the theoretical model.
Abstract
In reinforcement learning (RL), it is common to apply techniques used broadly in machine learning such as neural network function approximators and momentum-based optimizers. However, such tools were largely developed for supervised learning rather than nonstationary RL, leading practitioners to adopt target networks, clipped policy updates, and other RL-specific implementation tricks to combat this mismatch, rather than directly adapting this toolchain for use in RL. In this paper, we take a different approach and instead address the effect of nonstationarity by adapting the widely used Adam optimiser. We first analyse the impact of nonstationary gradient magnitude -- such as that caused by a change in target network -- on Adam's update size, demonstrating that such a change can lead to large updates and hence sub-optimal performance. To address this, we introduce Adam-Rel. Rather than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Logic, Reasoning, and Knowledge
MethodsADaptive gradient method with the OPTimal convergence rate · Adam
