Explaining Neural Networks with Reasons

Levin Hornischer; Hannes Leitgeb

arXiv:2505.14424·cs.LG·May 21, 2025

Explaining Neural Networks with Reasons

Levin Hornischer, Hannes Leitgeb

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel interpretability method for neural networks based on a philosophically grounded reasons vector, enabling logical and Bayesian explanations of neuron functions across architectures.

Contribution

It presents a scalable, uniform, and faithful interpretability approach that combines philosophical notions of explanation with practical neural network analysis.

Findings

01

Method is grounded in established philosophical explanation.

02

Applicable to various neural network architectures and modalities.

03

Interventions based on reason vectors lead to predictable output changes.

Abstract

We propose a new interpretability method for neural networks, which is based on a novel mathematico-philosophical theory of reasons. Our method computes a vector for each neuron, called its reasons vector. We then can compute how strongly this reasons vector speaks for various propositions, e.g., the proposition that the input image depicts digit 2 or that the input prompt has a negative sentiment. This yields an interpretation of neurons, and groups thereof, that combines a logical and a Bayesian perspective, and accounts for polysemanticity (i.e., that a single neuron can figure in multiple concepts). We show, both theoretically and empirically, that this method is: (1) grounded in a philosophically established notion of explanation, (2) uniform, i.e., applies to the common neural network architectures and modalities, (3) scalable, since computing reason vectors only involves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

levinhornischer/reasonsmethod
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications