Identifying Spurious Correlations and Correcting them with an Explanation-based Learning
Misgina Tsighe Hagos, Kathleen M. Curran, Brian Mac Namee

TL;DR
This paper introduces a method to detect and correct spurious correlations in image classification models by analyzing prediction certainty changes with image perturbations, enhancing model trustworthiness.
Contribution
It proposes a simple, effective approach to identify spurious correlations and an explanation-based learning method to remove them from trained models.
Findings
Successfully identified overdependence on spurious regions
Demonstrated removal of spurious correlations improves model reliability
Applicable to datasets with synthetic spurious regions
Abstract
Identifying spurious correlations learned by a trained model is at the core of refining a trained model and building a trustworthy model. We present a simple method to identify spurious correlations that have been learned by a model trained for image classification problems. We apply image-level perturbations and monitor changes in certainties of predictions made using the trained model. We demonstrate this approach using an image classification dataset that contains images with synthetically generated spurious regions and show that the trained model was overdependent on spurious regions. Moreover, we remove the learned spurious correlations with an explanation based learning approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
