CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies
Brian M Cho, Ana-Roxana Pop, Kyra Gan, Sam Corbett-Davies, Israel Nir,, Ariel Evnine, Nathan Kallus

TL;DR
This paper introduces CSPI-MT, a method for safe policy improvement in high-risk settings that uses multiple testing to better identify policies that reliably outperform a baseline, especially in challenging data conditions.
Contribution
The work develops a novel multiple testing approach for safe policy improvement, enhancing detection power and safety guarantees over existing methods for threshold policies.
Findings
Improves safety test power in synthetic and real datasets.
Achieves higher policy improvement detection rates.
Maintains control over adopting worse policies at the specified error level.
Abstract
When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on threshold policies, a ubiquitous class of policies with applications in economics, healthcare, and digital advertising. Existing methods rely on potentially underpowered safety checks and limit the opportunities for finding safe improvements, so too often they must revert to the baseline to maintain safety. We overcome these issues by leveraging the most powerful safety test in the asymptotic regime and allowing for multiple candidates to be tested for improvement over the baseline. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis · Software Reliability and Analysis Research
MethodsFocus
