Fairwashing: the risk of rationalization

Ulrich A\"ivodji; Hiromi Arai; Olivier Fortineau; S\'ebastien Gambs,; Satoshi Hara; Alain Tapp

arXiv:1901.09749·cs.LG·May 16, 2019·19 cites

Fairwashing: the risk of rationalization

Ulrich A\"ivodji, Hiromi Arai, Olivier Fortineau, S\'ebastien Gambs,, Satoshi Hara, Alain Tapp

PDF

Open Access 1 Repo

TL;DR

This paper highlights the risk of fairwashing, where explanation techniques can falsely suggest a model is fair, and introduces LaundryML to generate fair rule lists that mimic unfair models.

Contribution

It demonstrates how explanation methods can be manipulated to falsely portray unfair models as fair and proposes LaundryML to generate interpretable, less unfair rule lists.

Findings

01

Explanation methods can be used to rationalize unfair models as fair.

02

LaundryML produces rule lists with high fidelity to black-box models.

03

Generated rule lists are significantly less unfair while maintaining accuracy.

Abstract

Black-box explanation is the problem of explaining how a machine learning model -- whose internal logic is hidden to the auditor and generally complex -- produces its outcomes. Current approaches for solving this problem include model explanation, outcome explanation as well as model inspection. While these techniques can be beneficial by providing interpretability, they can be used in a negative manner to perform fairwashing, which we define as promoting the false perception that a machine learning model respects some ethical values. In particular, we demonstrate that it is possible to systematically rationalize decisions taken by an unfair black-box model using the model explanation as well as the outcome explanation approaches with a given fairness metric. Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aivodji/LaundryML
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification