A Rate-Distortion Framework for Explaining Black-box Model Decisions

Stefan Kolek; Duc Anh Nguyen; Ron Levie; Joan Bruna; Gitta Kutyniok

arXiv:2110.08252·cs.LG·October 19, 2021

A Rate-Distortion Framework for Explaining Black-box Model Decisions

Stefan Kolek, Duc Anh Nguyen, Ron Levie, Joan Bruna, Gitta Kutyniok

PDF

Open Access

TL;DR

The paper introduces the Rate-Distortion Explanation (RDE) framework, a mathematically grounded method for explaining black-box model decisions across various data types, including images, audio, and simulations.

Contribution

It proposes a novel, generalizable explanation framework based on rate-distortion theory applicable to any differentiable pre-trained model.

Findings

01

Demonstrates adaptability to multiple data modalities.

02

Provides a mathematically well-founded explanation method.

03

Applicable to neural networks and other differentiable models.

Abstract

We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework's adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis