Streaming Deep Reinforcement Learning Finally Works
Mohamed Elsayed, Gautham Vasan, A. Rupam Mahmood

TL;DR
This paper introduces stream-x algorithms that enable stable, efficient deep reinforcement learning in streaming settings, overcoming previous instability issues and matching batch RL performance.
Contribution
The paper presents the first deep RL algorithms designed for streaming learning, overcoming stream barrier and achieving stable learning comparable to batch methods.
Findings
Stream barrier exists in existing deep RL algorithms.
Stream-x algorithms successfully overcome stream barrier.
Stream-x algorithms achieve state-of-the-art performance in DM Control environments.
Abstract
Natural intelligence processes experience as a continuous stream, sensing, acting, and learning moment-by-moment in real time. Streaming learning, the modus operandi of classic reinforcement learning (RL) algorithms like Q-learning and TD, mimics natural learning by using the most recent sample without storing it. This approach is also ideal for resource-constrained, communication-limited, and privacy-sensitive applications. However, in deep RL, learners almost always use batch updates and replay buffers, making them computationally expensive and incompatible with streaming learning. Although the prevalence of batch deep RL is often attributed to its sample efficiency, a more critical reason for the absence of streaming deep RL is its frequent instability and failure to learn, which we refer to as stream barrier. This paper introduces the stream-x algorithms, the first class of deep RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsQ-Learning · Sparse Evolutionary Training
