Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

Riccardo De Monte; Matteo Cederle; and Gian Antonio Susto

arXiv:2603.08588·cs.LG·May 12, 2026

Towards Batch-to-Streaming Deep Reinforcement Learning for Continuous Control

Riccardo De Monte, Matteo Cederle, and Gian Antonio Susto

PDF

TL;DR

This paper introduces two novel streaming deep reinforcement learning algorithms, S2AC and SDAC, designed for resource-efficient continuous control and on-device finetuning, with a focus on batch-to-streaming transition challenges.

Contribution

The paper proposes S2AC and SDAC algorithms that are compatible with batch RL methods, suitable for resource-limited devices, and addresses the batch-to-streaming transition problem.

Findings

01

S2AC and SDAC achieve performance comparable to existing streaming RL baselines.

02

The algorithms are effective for on-device finetuning and Sim2Real transfer.

03

A principled approach improves batch-to-streaming transition performance.

Abstract

State-of-the-art deep reinforcement learning (RL) methods have achieved remarkable performance in continuous control tasks, yet their computational complexity is often incompatible with the constraints of resource-limited hardware, due to their reliance on replay buffers, batch updates, and target networks. The emerging paradigm of streaming deep RL addresses this limitation through purely online updates, achieving strong empirical performance on standard benchmarks. In this work, we propose two novel streaming deep RL algorithms, Streaming Soft Actor-Critic (S2AC) and Streaming Deterministic Actor-Critic (SDAC), explicitly designed to be compatible with state-of-the-art batch RL methods, making them particularly suitable for on-device finetuning applications such as Sim2Real transfer. Both algorithms achieve performance comparable to state-of-the-art streaming baselines on standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.