Marginalized Operators for Off-policy Reinforcement Learning
Yunhao Tang, Mark Rowland, R\'emi Munos, Michal Valko

TL;DR
This paper introduces marginalized operators for off-policy reinforcement learning, offering a scalable, variance-reducing alternative to existing multi-step operators, with demonstrated empirical performance improvements.
Contribution
It proposes a new class of marginalized operators that generalize multi-step operators and improve off-policy evaluation efficiency and accuracy.
Findings
Marginalized operators outperform traditional methods in off-policy evaluation.
They enable scalable computation of estimates with variance reduction.
Empirical results show improved policy optimization performance.
Abstract
In this work, we propose marginalized operators, a new class of off-policy evaluation operators for reinforcement learning. Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases. Marginalized operators also suggest a form of sample-based estimates with potential variance reduction, compared to sample-based estimates of the original multi-step operators. We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases. Finally, we empirically demonstrate that marginalized operators provide performance gains to off-policy evaluation and downstream policy optimization algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsRetrace
