Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with   Regularized Importance Sampling

Imad Aouali; Victor-Emmanuel Brunel; David Rohde; Anna Korba

arXiv:2406.03434·cs.LG·June 6, 2024

Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling

Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba

PDF

Open Access

TL;DR

This paper introduces a unified PAC-Bayesian framework to analyze pessimism in offline policy learning, demonstrating its effectiveness and enabling comparison of importance weight regularizations.

Contribution

It provides the first comprehensive PAC-Bayesian analysis of regularized importance sampling in offline policy learning, unifying various approaches under one theoretical framework.

Findings

01

PAC-Bayesian bound applies to common importance weight regularizations

02

Empirical results show standard importance weight regularization techniques are effective

03

Framework enables comparison of different regularization strategies

Abstract

Off-policy learning (OPL) often involves minimizing a risk estimator based on importance weighting to correct bias from the logging policy used to collect data. However, this method can produce an estimator with a high variance. A common solution is to regularize the importance weights and learn the policy by minimizing an estimator with penalties derived from generalization bounds specific to the estimator. This approach, known as pessimism, has gained recent attention but lacks a unified framework for analysis. To address this gap, we introduce a comprehensive PAC-Bayesian framework to examine pessimism with regularized importance weighting. We derive a tractable PAC-Bayesian generalization bound that universally applies to common importance weight regularizations, enabling their comparison within a single framework. Our empirical results challenge common understanding, demonstrating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Bayesian Modeling and Causal Inference