Distributing Synergy Functions: Unifying Game-Theoretic Interaction Methods for Machine-Learning Explainability
Daniel Lundstrom, Meisam Razaviyayn

TL;DR
This paper introduces a unifying framework for game-theoretic attribution and interaction methods in machine learning, defining a unique way to distribute feature synergies and analyzing gradient-based approaches for model explainability.
Contribution
It presents a comprehensive framework that unifies various attribution and interaction methods, characterizes gradient-based approaches, and highlights the importance of goal-oriented method development.
Findings
Unique full account of feature interactions (synergies) in continuous input settings.
Gradient-based methods are characterized by their actions on monomials.
Combination of criteria defines attribution and interaction methods uniquely.
Abstract
Deep learning has revolutionized many areas of machine learning, from computer vision to natural language processing, but these high-performance models are generally "black box." Explaining such models would improve transparency and trust in AI-powered decision making and is necessary for understanding other practical needs such as robustness and fairness. A popular means of enhancing model transparency is to quantify how individual inputs contribute to model outputs (called attributions) and the magnitude of interactions between groups of inputs. A growing number of these methods import concepts and results from game theory to produce attributions and interactions. This work presents a unifying framework for game-theory-inspired attribution and -order interaction methods. We show that, given modest assumptions, a unique full account of interactions between features, called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning
