Provably Better Explanations with Optimized Aggregation of Feature Attributions
Thomas Decker, Ananta R. Bhattarai, Jindong Gu, Volker Tresp, Florian, Buettner

TL;DR
This paper introduces a method to improve feature attribution explanations by optimally combining multiple attribution methods, resulting in more robust and faithful explanations for machine learning models.
Contribution
It proposes a novel convex combination approach to enhance explanation quality, outperforming individual attribution methods and baselines.
Findings
Combined attributions improve robustness and faithfulness.
The method outperforms individual attribution techniques.
Experimental results validate the approach across models and methods.
Abstract
Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by combining multiple explanations across distinct methods or their variations. For this purpose, we propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria such as robustness or faithfulness to the model behavior. Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
