Just Trial Once: Ongoing Causal Validation of Machine Learning Models
Jacob M. Chen, Michael Oberst

TL;DR
This paper proposes a method to evaluate the causal impact of updated machine learning models using data from prior randomized controlled trials, reducing the need for costly new experiments.
Contribution
It introduces conditions under which the causal effects of new ML model versions can be estimated or bounded from existing RCT data, considering model updates and user trust.
Findings
Provides bounds for causal impact of new ML models without new RCTs
Recommends trial design strategies to better assess future model updates
Offers practical guidelines to save resources in model deployment
Abstract
Machine learning (ML) models are increasingly used as decision-support tools in high-risk domains. Evaluating the causal impact of deploying such models can be done with a randomized controlled trial (RCT) that randomizes users to ML vs. control groups and assesses the effect on relevant outcomes. However, ML models are inevitably updated over time, and we often lack evidence for the causal impact of these updates. While the causal effect could be repeatedly validated with ongoing RCTs, such experiments are expensive and time-consuming to run. In this work, we present an alternative solution: using only data from a prior RCT, we give conditions under which the causal impact of a new ML model can be precisely bounded or estimated, even if it was not included in the RCT. Our assumptions incorporate two realistic constraints: ML predictions are often deterministic, and their impacts depend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
