Conditional Importance Sampling for Off-Policy Learning
Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom, Schaul, R\'emi Munos, Will Dabney

TL;DR
This paper introduces a new conceptual framework for off-policy reinforcement learning using conditional importance sampling, offering fresh insights and revealing new algorithmic possibilities.
Contribution
It presents a novel framework based on conditional expectations of importance sampling ratios, enhancing understanding and expanding the space of off-policy algorithms.
Findings
The framework provides new perspectives on existing algorithms.
Theoretical analysis of the algorithmic space.
Investigation of several algorithms derived from the framework.
Abstract
The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
