Why You Should Not Trust Interpretations in Machine Learning: Adversarial Attacks on Partial Dependence Plots
Xi Xin, Giles Hooker, Fei Huang

TL;DR
This paper reveals that interpretation tools like partial dependence plots can be manipulated through adversarial attacks, misleading stakeholders about a model's true discriminatory behavior while keeping the model's predictions intact.
Contribution
It introduces an adversarial framework that can generate deceptive PD plots, exposing vulnerabilities in current interpretation methods for machine learning models.
Findings
Adversarial modifications can hide model discrimination in PD plots.
Deceptive PD plots can be produced without significantly altering model predictions.
The framework works on real-world datasets like insurance claims and COMPAS.
Abstract
The adoption of artificial intelligence (AI) across industries has led to the widespread use of complex black-box models and interpretation tools for decision making. This paper proposes an adversarial framework to uncover the vulnerability of permutation-based interpretation methods for machine learning tasks, with a particular focus on partial dependence (PD) plots. This adversarial framework modifies the original black box model to manipulate its predictions for instances in the extrapolation domain. As a result, it produces deceptive PD plots that can conceal discriminatory behaviors while preserving most of the original model's predictions. This framework can produce multiple fooled PD plots via a single model. By using real-world datasets including an auto insurance claims dataset and COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
MethodsFocus
