Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations
Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach,, Himabindu Lakkaraju

TL;DR
This paper investigates whether disparities exist in the quality of post hoc explanations across different population subgroups, revealing that complex models and certain explanation methods can lead to unfairness in explanation quality.
Contribution
It introduces a novel evaluation framework to quantitatively measure group-based disparities in explanation quality and empirically analyzes when such disparities occur.
Findings
Disparities are more common with complex, non-linear models.
Some explanation methods like Integrated Gradients and SHAP exhibit more disparities.
First study to highlight group-based disparities in explanation quality.
Abstract
As post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across various population subgroups including the minority groups. For instance, it should not be the case that explanations associated with instances belonging to a particular gender subgroup (e.g., female) are less accurate than those associated with other genders. However, there is little to no research that assesses if there exist such group-based disparities in the quality of the explanations output by state-of-the-art explanation methods. In this work, we address the aforementioned gaps by initiating the study of identifying group-based disparities in explanation quality. To this end, we first outline the key properties which constitute explanation quality and where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsHigh-Order Consensuses
