"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations
Himabindu Lakkaraju, Osbert Bastani

TL;DR
This paper investigates how misleading explanations of black box machine learning models can manipulate user trust, proposing a theoretical framework and demonstrating through user studies that trust can be intentionally misled.
Contribution
It introduces a novel theoretical framework for generating misleading explanations and empirically shows how these explanations can manipulate user trust in black box models.
Findings
Misleading explanations can significantly alter user trust.
Theoretical framework enables systematic creation of misleading explanations.
User trust can be manipulated even with domain experts.
Abstract
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a human interpretable manner. It has recently become apparent that a high-fidelity explanation of a black box ML model may not accurately reflect the biases in the black box. As a consequence, explanations have the potential to mislead human users into trusting a problematic black box. In this work, we rigorously explore the notion of misleading explanations and how they influence user trust in black-box models. More specifically, we propose a novel theoretical framework for understanding and generating misleading explanations, and carry out a user study with domain experts to demonstrate how these explanations can be used to mislead users. Our work is the first to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data
