Policy Learning with Abstention
Ayush Sawarni, Jikai Jin, Justin Whitehouse, Vasilis Syrgkanis

TL;DR
This paper introduces a policy learning framework with abstention, allowing decisions to be deferred to safer options, and provides theoretical guarantees for its effectiveness in high-stakes applications.
Contribution
It proposes a two-stage learner for policy abstention, extending regret guarantees to unknown propensities, and demonstrates its versatility in improving policy learning under various conditions.
Findings
Achieves fast O(1/n) regret bounds with known propensities.
Extends guarantees to unknown propensities using a doubly robust objective.
Enhances policy learning robustness and safety through abstention strategies.
Abstract
Policy learning algorithms are widely used in areas such as personalized medicine and advertising to develop individualized treatment regimes. However, most methods force a decision even when predictions are uncertain, which is risky in high-stakes settings. We study policy learning with abstention, where a policy may defer to a safe default or an expert. When a policy abstains, it receives a small additive reward on top of the value of a random guess. We propose a two-stage learner that first identifies a set of near-optimal policies and then constructs an abstention rule from their disagreements. We establish fast O(1/n)-type regret guarantees when propensities are known, and extend these guarantees to the unknown-propensity case via a doubly robust (DR) objective. We further show that abstention is a versatile tool with direct applications to other core problems in policy learning:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
