Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning

Nilaksh; Antoine Clavaud; Mathieu Reymond; Fran\c{c}ois Rivest; Sarath Chandar

arXiv:2602.09396·cs.LG·February 11, 2026

Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning

Nilaksh, Antoine Clavaud, Mathieu Reymond, Fran\c{c}ois Rivest, Sarath Chandar

PDF

Open Access

TL;DR

This paper introduces a novel method for streaming reinforcement learning that enhances representation learning from limited, transient data by extending self-predictive representations and addressing training instabilities, leading to improved performance and richer representations.

Contribution

It extends Self-Predictive Representations to streaming RL, introduces orthogonal gradient updates to stabilize training, and demonstrates improved performance and richer representations without replay buffers.

Findings

01

Outperforms existing streaming RL baselines on multiple benchmarks.

02

Learns significantly richer and more meaningful representations.

03

Remains computationally efficient, training on few CPU cores.

Abstract

In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update. While this minimizes resource usage for on-device applications, it makes agents notoriously sample-inefficient, since value-based losses alone struggle to extract meaningful representations from transient data. We propose extending Self-Predictive Representations (SPR) to the streaming pipeline to maximize the utility of every observed frame. However, due to the highly correlated samples induced by the streaming regime, naively applying this auxiliary loss results in training instabilities. Thus, we introduce orthogonal gradient updates relative to the momentum target and resolve gradient conflicts arising from streaming-specific optimizers. Validated across the Atari, MinAtar, and Octax suites, our approach systematically outperforms existing streaming baselines.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Adversarial Robustness in Machine Learning