PAC-Bayesian Reward-Certified Outcome Weighted Learning

Yuya Ishikawa; Shu Tamano

arXiv:2604.01946·cs.LG·April 3, 2026

PAC-Bayesian Reward-Certified Outcome Weighted Learning

Yuya Ishikawa, Shu Tamano

PDF

TL;DR

PROWL introduces a PAC-Bayesian framework for robust, reward-uncertainty-aware individualized treatment rule estimation, providing finite-sample guarantees and improved treatment policy performance.

Contribution

It develops a novel PAC-Bayesian approach that incorporates reward uncertainty into outcome weighted learning with theoretical guarantees and automated calibration.

Findings

01

PROWL outperforms standard methods under severe reward uncertainty.

02

It provides a finite-sample PAC-Bayes lower bound for randomized ITRs.

03

The method includes an automated, bounds-based calibration procedure.

Abstract

Estimating optimal individualized treatment rules (ITRs) via outcome weighted learning (OWL) often relies on observed rewards that are noisy or optimistic proxies for the true latent utility. Ignoring this reward uncertainty leads to the selection of policies with inflated apparent performance, yet existing OWL frameworks lack the finite-sample guarantees required to systematically embed such uncertainty into the learning objective. To address this issue, we propose PAC-Bayesian Reward-Certified Outcome Weighted Learning (PROWL). Given a one-sided uncertainty certificate, PROWL constructs a conservative reward and a strictly policy-dependent lower bound on the true expected value. Theoretically, we prove an exact certified reduction that transforms robust policy learning into a unified, split-free cost-sensitive classification task. This formulation enables the derivation of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.