D-SPEAR: Dual-Stream Prioritized Experience Adaptive Replay for Stable Reinforcement Learning in Robotic Manipulation
Yu Zhang, Karl Mason

TL;DR
D-SPEAR introduces a dual-stream replay framework that decouples actor and critic sampling, enhancing stability and performance in robotic manipulation reinforcement learning tasks.
Contribution
It proposes a novel adaptive replay mechanism with separate actor and critic streams, improving training stability and effectiveness over existing methods.
Findings
D-SPEAR outperforms SAC, TD3, and DDPG on Robosuite tasks.
The adaptive anchor balances sampling strategies effectively.
Ablation studies confirm the benefits of dual-stream replay.
Abstract
Robotic manipulation remains challenging for reinforcement learning due to contact-rich dynamics, long horizons, and training instability. Although off-policy actor-critic algorithms such as SAC and TD3 perform well in simulation, they often suffer from policy oscillations and performance collapse in realistic settings, partly due to experience replay strategies that ignore the differing data requirements of the actor and the critic. We propose D-SPEAR: Dual-Stream Prioritized Experience Adaptive Replay, a replay framework that decouples actor and critic sampling while maintaining a shared replay buffer. The critic leverages prioritized replay for efficient value learning, whereas the actor is updated using low-error transitions to stabilize policy optimization. An adaptive anchor mechanism balances uniform and prioritized sampling based on the coefficient of variation of TD errors, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
