Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam   Timesteps

Benjamin Ellis; Matthew T. Jackson; Andrei Lupu; Alexander D. Goldie,; Mattie Fellows; Shimon Whiteson; Jakob Foerster

arXiv:2412.17113·cs.LG·December 24, 2024

Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps

Benjamin Ellis, Matthew T. Jackson, Andrei Lupu, Alexander D. Goldie,, Mattie Fellows, Shimon Whiteson, Jakob Foerster

PDF

Open Access

TL;DR

This paper introduces Adam-Rel, a modified optimizer for reinforcement learning that resets the timestep after target network updates, effectively addressing nonstationarity and improving performance across various RL benchmarks.

Contribution

The paper proposes Adam-Rel, an adaptation of Adam that resets its timestep within epochs to better handle nonstationarity in RL, outperforming standard Adam.

Findings

01

Adam-Rel reduces large updates caused by nonstationary gradients.

02

Improved performance of Adam-Rel on Atari and Craftax benchmarks.

03

Gradient norm increases are observed in RL, validating the theoretical model.

Abstract

In reinforcement learning (RL), it is common to apply techniques used broadly in machine learning such as neural network function approximators and momentum-based optimizers. However, such tools were largely developed for supervised learning rather than nonstationary RL, leading practitioners to adopt target networks, clipped policy updates, and other RL-specific implementation tricks to combat this mismatch, rather than directly adapting this toolchain for use in RL. In this paper, we take a different approach and instead address the effect of nonstationarity by adapting the widely used Adam optimiser. We first analyse the impact of nonstationary gradient magnitude -- such as that caused by a change in target network -- on Adam's update size, demonstrating that such a change can lead to large updates and hence sub-optimal performance. To address this, we introduce Adam-Rel. Rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Logic, Reasoning, and Knowledge

MethodsADaptive gradient method with the OPTimal convergence rate · Adam