Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings
Jan Macdonald, Mathieu Besan\c{c}on, Sebastian Pokutta

TL;DR
This paper introduces a constrained optimization approach using Frank-Wolfe algorithms to produce sparse and ordered relevance maps for neural network interpretability, outperforming existing methods.
Contribution
It reformulates RDE as a constrained optimization problem, enabling multi-rate and relevance-ordering variants that improve interpretability and performance.
Findings
Reformulated RDE with Frank-Wolfe for sparsity control
Proposed multi-rate and relevance-ordering RDE variants
Empirically outperforms standard RDE and baselines
Abstract
We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. Reformulating the Rate-Distortion Explanations (RDE) method for relevance attribution as a constrained optimization problem provides precise control over the sparsity of relevance maps. This enables a novel multi-rate as well as a relevance-ordering variant of RDE that both empirically outperform standard RDE and other baseline methods in a well-established comparison test. We showcase several deterministic and stochastic variants of the Frank-Wolfe algorithm and their effectiveness for RDE.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
