Intentional Updates for Streaming Reinforcement Learning
Arsalan Sharifnassab, Mohamed Elsayed, Kris De Asis, A. Rupam Mahmood, Richard S. Sutton

TL;DR
This paper introduces intentional updates for streaming reinforcement learning, specifying desired outcomes to improve stability and performance in online settings, achieving state-of-the-art results.
Contribution
It extends the concept of intentional updates from linear regression to deep reinforcement learning, proposing practical algorithms for more stable online learning.
Findings
Methods achieve state-of-the-art streaming performance.
Algorithms perform comparably to batch and replay-buffer approaches.
Intentional updates improve stability in online reinforcement learning.
Abstract
In gradient-based learning, a step size chosen in parameter units does not produce a predictable per-step change in function output. This often leads to instability in the streaming setting (i.e., batch size=1), where stochasticity is not averaged out and update magnitudes can momentarily become arbitrarily big or small. Instead, we propose intentional updates: first specify the intended outcome of an update and then solve for the step size that approximately achieves it. This strategy has precedent in online supervised linear regression via Normalized Least Mean Squares algorithm, which selects a step size to yield a specified change in the function output proportional to the current error. We extend this principle to streaming deep reinforcement learning by defining appropriate intended outcomes: Intentional TD aims for a fixed fractional reduction of the TD error, and Intentional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
