A Rate-Distortion Framework for Explaining Black-box Model Decisions
Stefan Kolek, Duc Anh Nguyen, Ron Levie, Joan Bruna, Gitta Kutyniok

TL;DR
The paper introduces the Rate-Distortion Explanation (RDE) framework, a mathematically grounded method for explaining black-box model decisions across various data types, including images, audio, and simulations.
Contribution
It proposes a novel, generalizable explanation framework based on rate-distortion theory applicable to any differentiable pre-trained model.
Findings
Demonstrates adaptability to multiple data modalities.
Provides a mathematically well-founded explanation method.
Applicable to neural networks and other differentiable models.
Abstract
We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework's adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis
