Space Explanations of Neural Network Classification

Faezeh Labbaf; Tom\'a\v{s} Kol\'arik; Martin Blicha; Grigory Fedyukovich; Michael Wand; Natasha Sharygina

arXiv:2511.22498·cs.LG·December 1, 2025

Space Explanations of Neural Network Classification

Faezeh Labbaf, Tom\'a\v{s} Kol\'arik, Martin Blicha, Grigory Fedyukovich, Michael Wand, Natasha Sharygina

PDF

Open Access 1 Video

TL;DR

This paper introduces Space Explanations, a logic-based method providing provable guarantees for neural network behavior over input regions, improving interpretability with automated, meaningful explanations.

Contribution

It proposes a novel logic-based framework using Craig interpolation and unsatisfiable core techniques to generate provable, region-based explanations for neural networks.

Findings

01

More meaningful explanations than state-of-the-art methods

02

Effective across small to large neural network models

03

Provides provable guarantees of network behavior

Abstract

We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Space Explanations of Neural Network Classification· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Model Reduction and Neural Networks