Fairwashing: the risk of rationalization
Ulrich A\"ivodji, Hiromi Arai, Olivier Fortineau, S\'ebastien Gambs,, Satoshi Hara, Alain Tapp

TL;DR
This paper highlights the risk of fairwashing, where explanation techniques can falsely suggest a model is fair, and introduces LaundryML to generate fair rule lists that mimic unfair models.
Contribution
It demonstrates how explanation methods can be manipulated to falsely portray unfair models as fair and proposes LaundryML to generate interpretable, less unfair rule lists.
Findings
Explanation methods can be used to rationalize unfair models as fair.
LaundryML produces rule lists with high fidelity to black-box models.
Generated rule lists are significantly less unfair while maintaining accuracy.
Abstract
Black-box explanation is the problem of explaining how a machine learning model -- whose internal logic is hidden to the auditor and generally complex -- produces its outcomes. Current approaches for solving this problem include model explanation, outcome explanation as well as model inspection. While these techniques can be beneficial by providing interpretability, they can be used in a negative manner to perform fairwashing, which we define as promoting the false perception that a machine learning model respects some ethical values. In particular, we demonstrate that it is possible to systematically rationalize decisions taken by an unfair black-box model using the model explanation as well as the outcome explanation approaches with a given fairness metric. Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
