Revisiting Adam for Streaming Reinforcement Learning
Florin Gogianu, Adrian Catalin Lutu, Razvan Pascanu

TL;DR
This paper investigates the effectiveness of established online reinforcement learning updates like DQN and C51, emphasizing Adam's properties, and introduces Adaptive Q$( heta)$, a variance-adjusted algorithm that outperforms previous methods on Atari games.
Contribution
The study reveals key properties for robust online RL updates and proposes a new variance-adjusted algorithm, Adaptive Q$( heta)$, achieving superior performance.
Findings
C51 performs well with online updates and is competitive with StreamQ.
Adam's interaction with updates benefits from bounded derivatives and variance adjustment.
Adaptive Q$( heta)$ approaches double the human baseline on Atari games.
Abstract
Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the promise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walked the contrary path, augmenting agents with replay buffers or parallel sampling routines, in an effort to tame learning instability. Recently, this topic has been revisited by Elsayed et al. (2024), focusing on update computation through eligibility traces and modifications to the optimisation routine, resulting in the StreamQ algorithm. In this work we take a step back, investigating the efficacy of established updates, such as those implemented by DQN and C51 within this online setting. Not only do we find that they perform well, but through analysing how the optimisation algorithm generally, and Adam in particular, interacts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
