Model Monitoring in the Absence of Labeled Data via Feature Attributions Distributions
Carlos Mougan

TL;DR
This paper proposes a method for monitoring AI models without labeled data by analyzing feature attribution distributions, providing theoretical guarantees for AI alignment and performance monitoring.
Contribution
It introduces a novel approach using feature attribution distributions for model monitoring in unlabeled settings, with theoretical insights and guarantees.
Findings
Effective detection of model behavior changes without labels
Theoretical guarantees for feature attribution-based monitoring
Applicability to AI alignment and performance metrics
Abstract
Model monitoring involves analyzing AI algorithms once they have been deployed and detecting changes in their behaviour. This thesis explores machine learning model monitoring ML before the predictions impact real-world decisions or users. This step is characterized by one particular condition: the absence of labelled data at test time, which makes it challenging, even often impossible, to calculate performance metrics. The thesis is structured around two main themes: (i) AI alignment, measuring if AI models behave in a manner consistent with human values and (ii) performance monitoring, measuring if the models achieve specific accuracy goals or desires. The thesis uses a common methodology that unifies all its sections. It explores feature attribution distributions for both monitoring dimensions. Using these feature attribution explanations, we can exploit their theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Bayesian Modeling and Causal Inference
