On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
Dennis Wei, Rahul Nair, Amit Dhurandhar, Kush R. Varshney, Elizabeth, M. Daly, Moninder Singh

TL;DR
This paper introduces a quantitative approach to assess the safety of interpretable machine learning models by measuring their maximum deviation from a safe reference model, using optimization and bandit techniques.
Contribution
It proposes a novel maximum deviation metric for safety assessment, applicable to various models, and demonstrates how interpretability enhances safety evaluation through case studies.
Findings
Exact and efficient computation of maximum deviation for decision trees and generalized linear models.
Bounds on maximum deviation for non-interpretable models like ensemble methods.
Interpretability leads to tighter safety bounds and improved model assessment.
Abstract
Interpretable and explainable machine learning has seen a recent surge of interest. We focus on safety as a key motivation behind the surge and make the relationship between interpretability and safety more quantitative. Toward assessing safety, we introduce the concept of maximum deviation via an optimization problem to find the largest deviation of a supervised learning model from a reference model regarded as safe. We then show how interpretability facilitates this safety assessment. For models including decision trees, generalized linear and additive models, the maximum deviation can be computed exactly and efficiently. For tree ensembles, which are not regarded as interpretable, discrete optimization techniques can still provide informative bounds. For a broader class of piecewise Lipschitz functions, we leverage the multi-armed bandit literature to show that interpretability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
