Marginalized Operators for Off-policy Reinforcement Learning

Yunhao Tang; Mark Rowland; R\'emi Munos; Michal Valko

arXiv:2203.16177·cs.LG·March 31, 2022

Marginalized Operators for Off-policy Reinforcement Learning

Yunhao Tang, Mark Rowland, R\'emi Munos, Michal Valko

PDF

Open Access 1 Datasets

TL;DR

This paper introduces marginalized operators for off-policy reinforcement learning, offering a scalable, variance-reducing alternative to existing multi-step operators, with demonstrated empirical performance improvements.

Contribution

It proposes a new class of marginalized operators that generalize multi-step operators and improve off-policy evaluation efficiency and accuracy.

Findings

01

Marginalized operators outperform traditional methods in off-policy evaluation.

02

They enable scalable computation of estimates with variance reduction.

03

Empirical results show improved policy optimization performance.

Abstract

In this work, we propose marginalized operators, a new class of off-policy evaluation operators for reinforcement learning. Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases. Marginalized operators also suggest a form of sample-based estimates with potential variance reduction, compared to sample-based estimates of the original multi-step operators. We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases. Finally, we empirically demonstrate that marginalized operators provide performance gains to off-policy evaluation and downstream policy optimization algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsRetrace