Hacking a surrogate model approach to XAI
Alexander Wilhelm, Katharina A. Zweig

TL;DR
This paper investigates the effectiveness of surrogate models in explainable AI, revealing that discriminating behaviors can be hidden at various levels of the surrogate, impacting transparency and interpretability.
Contribution
The study demonstrates how surrogate models can obscure discrimination by hiding it at different levels, challenging assumptions about their interpretability in XAI.
Findings
Discrimination can be hidden at arbitrary levels in surrogate decision trees.
Surrogate models can be manipulated to obscure discriminatory patterns.
The approach generalizes to other surrogate models.
Abstract
In recent years, the number of new applications for highly complex AI systems has risen significantly. Algorithmic decision-making systems (ADMs) are one of such applications, where an AI system replaces the decision-making process of a human expert. As one approach to ensure fairness and transparency of such systems, explainable AI (XAI) has become more important. One variant to achieve explainability are surrogate models, i.e., the idea to train a new simpler machine learning model based on the input-output-relationship of a black box model. The simpler machine learning model could, for example, be a decision tree, which is thought to be intuitively understandable by humans. However, there is not much insight into how well the surrogate model approximates the black box. Our main assumption is that a good surrogate model approach should be able to bring such a discriminating behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Advanced Database Systems and Queries · Distributed and Parallel Computing Systems
MethodsSoftmax · Attention Is All You Need
