Interpretability Guarantees with Merlin-Arthur Classifiers

Stephan W\"aldchen; Kartikey Sharma; Berkant Turan; Max Zimmer,; Sebastian Pokutta

arXiv:2206.00759·cs.LG·March 25, 2024

Interpretability Guarantees with Merlin-Arthur Classifiers

Stephan W\"aldchen, Kartikey Sharma, Berkant Turan, Max Zimmer,, Sebastian Pokutta

PDF

Open Access 1 Repo

TL;DR

This paper introduces Merlin-Arthur classifiers that offer provable interpretability guarantees for complex models like neural networks, using interactive protocols and measurable metrics to quantify feature importance.

Contribution

It presents a novel interactive multi-agent classification framework with interpretability guarantees that do not rely on optimal agents or independent feature distributions.

Findings

01

Provable lower bounds on mutual information between features and decisions.

02

Evaluation on small datasets confirms high mutual information.

03

New concept of Asymmetric Feature Correlation captures interpretability challenges.

Abstract

We propose an interactive multi-agent classifier that provides provable interpretability guarantees even for complex agents such as neural networks. These guarantees consist of lower bounds on the mutual information between selected features and the classification decision. Our results are inspired by the Merlin-Arthur protocol from Interactive Proof Systems and express these bounds in terms of measurable metrics such as soundness and completeness. Compared to existing interactive setups, we rely neither on optimal agents nor on the assumption that features are distributed independently. Instead, we use the relative strength of the agents as well as the new concept of Asymmetric Feature Correlation which captures the precise kind of correlations that make interpretability guarantees difficult. We evaluate our results on two small-scale datasets where high mutual information can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zib-iol/merlin-arthur-classifiers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Data Stream Mining Techniques · Adversarial Robustness in Machine Learning