Space Explanations of Neural Network Classification
Faezeh Labbaf, Tom\'a\v{s} Kol\'arik, Martin Blicha, Grigory Fedyukovich, Michael Wand, Natasha Sharygina

TL;DR
This paper introduces Space Explanations, a logic-based method providing provable guarantees for neural network behavior over input regions, improving interpretability with automated, meaningful explanations.
Contribution
It proposes a novel logic-based framework using Craig interpolation and unsatisfiable core techniques to generate provable, region-based explanations for neural networks.
Findings
More meaningful explanations than state-of-the-art methods
Effective across small to large neural network models
Provides provable guarantees of network behavior
Abstract
We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks
