Entropy-based Logic Explanations of Neural Networks
Pietro Barbiero, Gabriele Ciravegna, Francesco Giannini, Pietro Li\'o,, Marco Gori, Stefano Melacci

TL;DR
This paper introduces an entropy-based, differentiable method for extracting formal First-Order Logic explanations from neural networks, improving interpretability and performance in safety-critical applications.
Contribution
It presents a novel end-to-end differentiable approach that automatically identifies relevant concepts and provides concise logic explanations, outperforming existing models.
Findings
Enables distillation of concise logic explanations in clinical and vision data
Outperforms state-of-the-art white-box models in accuracy
Matches black box model performance
Abstract
Explainable artificial intelligence has rapidly emerged since lawmakers have started requiring interpretable models for safety-critical domains. Concept-based neural networks have arisen as explainable-by-design methods as they leverage human-understandable symbols (i.e. concepts) to predict class memberships. However, most of these approaches focus on the identification of the most relevant concepts but do not provide concise, formal explanations of how such concepts are leveraged by the classifier to make predictions. In this paper, we propose a novel end-to-end differentiable approach enabling the extraction of logic explanations from neural networks using the formalism of First-Order Logic. The method relies on an entropy-based criterion which automatically identifies the most relevant concepts. We consider four different case studies to demonstrate that: (i) this entropy-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
