The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons
Solon Barocas, Andrew D. Selbst, Manish Raghavan

TL;DR
This paper critically examines the assumptions behind counterfactual and principal reason explanations in machine learning, revealing their limitations, subjective nature, and potential legal and ethical implications.
Contribution
It identifies key overlooked assumptions in feature-highlighting explanations and discusses their impact on the explanations' validity and practical utility.
Findings
Assumptions about real-world actions from feature changes are often invalid.
Features are not always comparable based solely on training data distribution.
There are inherent tensions between explanation usefulness and model confidentiality.
Abstract
Counterfactual explanations are gaining prominence within technical, legal, and business circles as a way to explain the decisions of a machine learning model. These explanations share a trait with the long-established "principal reason" explanations required by U.S. credit laws: they both explain a decision by highlighting a set of features deemed most relevant--and withholding others. These "feature-highlighting explanations" have several desirable properties: They place no constraints on model complexity, do not require model disclosure, detail what needed to be different to achieve a different decision, and seem to automate compliance with the law. But they are far more complex and subjective than they appear. In this paper, we demonstrate that the utility of feature-highlighting explanations relies on a number of easily overlooked assumptions: that the recommended change in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
