Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound

Tal Fiskus; Uri Shaham

arXiv:2507.11269·cs.LG·October 20, 2025

Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound

Tal Fiskus, Uri Shaham

PDF

Open Access 1 Video

TL;DR

This paper introduces a causal bound framework for deep reinforcement learning that leverages past data more effectively, significantly improving sample efficiency and performance across multiple domains.

Contribution

It presents a novel causal bound on the factual loss in DRL, integrating the Neyman-Rubin framework and utilizing stored past value outputs to enhance learning efficiency.

Findings

01

Up to 383% higher reward ratio in experiments

02

Reduced experience replay buffer size by up to 96%

03

Significant improvement in sample efficiency with negligible cost

Abstract

Deep reinforcement learning (DRL) agents excel in solving complex decision-making tasks across various domains. However, they often require a substantial number of training steps and a vast experience replay buffer, leading to significant computational and resource demands. To address these challenges, we introduce a novel theoretical result that leverages the Neyman-Rubin potential outcomes framework into DRL. Unlike most methods that focus on bounding the counterfactual loss, we establish a causal bound on the factual loss, which is analogous to the on-policy loss in DRL. This bound is computed by storing past value network outputs in the experience replay buffer, effectively utilizing data that is usually discarded. Extensive experiments across the Atari 2600 and MuJoCo domains on various agents, such as DQN and SAC, achieve up to 383% higher reward ratio, outperforming the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound· slideslive

Taxonomy

TopicsAuction Theory and Applications

MethodsExperience Replay · Deep Q-Network · Focus