Two-step counterfactual generation for OOD examples
Nawid Keshtmand, Raul Santos-Rodriguez, Jonathan Lawry

TL;DR
This paper introduces a novel approach for explaining why models classify data points as out-of-distribution by generating counterfactual examples that transition between OOD categories, enhancing interpretability in safety-critical AI systems.
Contribution
The paper presents a new method for generating OOD counterfactuals, addressing the gap in explainability for OOD detection in machine learning models.
Findings
Effective generation of OOD counterfactuals demonstrated on synthetic and benchmark datasets.
Comparison shows advantages over existing methods in interpretability metrics.
Provides insights into model decision boundaries for OOD detection.
Abstract
Two fundamental requirements for the deployment of machine learning models in safety-critical systems are to be able to detect out-of-distribution (OOD) data correctly and to be able to explain the prediction of the model. Although significant effort has gone into both OOD detection and explainable AI, there has been little work on explaining why a model predicts a certain data point is OOD. In this paper, we address this question by introducing the concept of an OOD counterfactual, which is a perturbed data point that iteratively moves between different OOD categories. We propose a method for generating such counterfactuals, investigate its application on synthetic and benchmark data, and compare it to several benchmark methods using a range of metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
