Conditional Importance Sampling for Off-Policy Learning

Mark Rowland; Anna Harutyunyan; Hado van Hasselt; Diana Borsa; Tom; Schaul; R\'emi Munos; Will Dabney

arXiv:1910.07479·cs.LG·July 31, 2020

Conditional Importance Sampling for Off-Policy Learning

Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom, Schaul, R\'emi Munos, Will Dabney

PDF

Open Access

TL;DR

This paper introduces a new conceptual framework for off-policy reinforcement learning using conditional importance sampling, offering fresh insights and revealing new algorithmic possibilities.

Contribution

It presents a novel framework based on conditional expectations of importance sampling ratios, enhancing understanding and expanding the space of off-policy algorithms.

Findings

01

The framework provides new perspectives on existing algorithms.

02

Theoretical analysis of the algorithmic space.

03

Investigation of several algorithms derived from the framework.

Abstract

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization