Designing monitoring strategies for deployed machine learning   algorithms: navigating performativity through a causal lens

Jean Feng; Adarsh Subbaswamy; Alexej Gossmann; Harvineet Singh,; Berkman Sahiner; Mi-Ok Kim; Gene Pennello; Nicholas Petrick; Romain; Pirracchio; Fan Xia

arXiv:2311.11463·cs.LG·February 27, 2024·1 cites

Designing monitoring strategies for deployed machine learning algorithms: navigating performativity through a causal lens

Jean Feng, Adarsh Subbaswamy, Alexej Gossmann, Harvineet Singh,, Berkman Sahiner, Mi-Ok Kim, Gene Pennello, Nicholas Petrick, Romain, Pirracchio, Fan Xia

PDF

Open Access

TL;DR

This paper explores how to systematically design monitoring strategies for deployed machine learning algorithms by applying causal inference, highlighting the importance of choosing appropriate metrics and data sources to account for performativity effects.

Contribution

It introduces a causal framework for selecting monitoring criteria and data sources, demonstrating their impact through simulation in a real-world risk prediction case study.

Findings

01

Not all monitoring systems perform equally in detecting issues.

02

Causal reasoning improves the design of effective monitoring strategies.

03

Different data sources and criteria significantly affect detection speed and interpretability.

Abstract

After a machine learning (ML)-based system is deployed, monitoring its performance is important to ensure the safety and effectiveness of the algorithm over time. When an ML algorithm interacts with its environment, the algorithm can affect the data-generating mechanism and be a major source of bias when evaluating its standalone performance, an issue known as performativity. Although prior work has shown how to validate models in the presence of performativity using causal inference techniques, there has been little work on how to monitor models in the presence of performativity. Unlike the setting of model validation, there is much less agreement on which performance metrics to monitor. Different monitoring criteria impact how interpretable the resulting test statistic is, what assumptions are needed for identifiability, and the speed of detection. When this choice is further coupled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Biomedical and Engineering Education

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Causal inference