Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps   and Relevance Orderings

Jan Macdonald; Mathieu Besan\c{c}on; Sebastian Pokutta

arXiv:2110.08105·cs.LG·February 1, 2022·1 cites

Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

Jan Macdonald, Mathieu Besan\c{c}on, Sebastian Pokutta

PDF

Open Access 1 Repo

TL;DR

This paper introduces a constrained optimization approach using Frank-Wolfe algorithms to produce sparse and ordered relevance maps for neural network interpretability, outperforming existing methods.

Contribution

It reformulates RDE as a constrained optimization problem, enabling multi-rate and relevance-ordering variants that improve interpretability and performance.

Findings

01

Reformulated RDE with Frank-Wolfe for sparsity control

02

Proposed multi-rate and relevance-ordering RDE variants

03

Empirically outperforms standard RDE and baselines

Abstract

We study the effects of constrained optimization formulations and Frank-Wolfe algorithms for obtaining interpretable neural network predictions. Reformulating the Rate-Distortion Explanations (RDE) method for relevance attribution as a constrained optimization problem provides precise control over the sparsity of relevance maps. This enables a novel multi-rate as well as a relevance-ordering variant of RDE that both empirically outperform standard RDE and other baseline methods in a well-established comparison test. We showcase several deterministic and stochastic variants of the Frank-Wolfe algorithm and their effectiveness for RDE.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zib-iol/fw-rde
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning