A Functional Information Perspective on Model Interpretation
Itai Gat, Nitay Calderon, Roi Reichart, Tamir Hazan

TL;DR
This paper introduces a theoretical framework for model interpretability based on functional entropy and Fisher information, providing a principled way to quantify feature contributions in complex models.
Contribution
It proposes a novel interpretability method grounded in information theory, leveraging the log-Sobolev inequality to measure feature importance.
Findings
Outperforms existing sampling-based interpretability methods
Effective across image, text, and audio data
Provides a theoretical basis for feature contribution measurement
Abstract
Contemporary predictive models are hard to interpret as their deep nets exploit numerous complex relations between input elements. This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input. We rely on the log-Sobolev inequality that bounds the functional entropy by the functional Fisher information with respect to the covariance of the data. This provides a principled way to measure the amount of information contribution of a subset of features to the decision function. Through extensive experiments, we show that our method surpasses existing interpretability sampling-based methods on various data signals such as image, text, and audio.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Neural Networks and Applications
