Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health
Pavel Dolin, Weizhi Li, Gautam Dasarathy, Visar Berisha

TL;DR
This paper advocates for the adoption of statistically valid, label-efficient testing frameworks as a standard for post-deployment monitoring of clinical AI, to ensure safety, reliability, and regulatory compliance in real-world healthcare settings.
Contribution
It introduces a formal, statistically rigorous approach to post-deployment monitoring, framing change detection as hypothesis testing to improve reliability and reproducibility.
Findings
Current monitoring practices are manual and reactive.
Statistically valid testing provides explicit error guarantees.
Framework supports reproducibility and formal inference.
Abstract
This position paper argues that post-deployment monitoring in clinical AI is underdeveloped and proposes statistically valid and label-efficient testing frameworks as a principled foundation for ensuring reliability and safety in real-world deployment. A recent review found that only 9% of FDA-registered AI-based healthcare tools include a post-deployment surveillance plan. Existing monitoring approaches are often manual, sporadic, and reactive, making them ill-suited for the dynamic environments in which clinical models operate. We contend that post-deployment monitoring should be grounded in label-efficient and statistically valid testing frameworks, offering a principled alternative to current practices. We use the term "statistically valid" to refer to methods that provide explicit guarantees on error rates (e.g., Type I/II error), enable formal inference under pre-defined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Healthcare Technology and Patient Monitoring
