A Mechanistic Explanatory Strategy for XAI

Marcin Rabiza

arXiv:2411.01332·cs.LG·May 22, 2026

A Mechanistic Explanatory Strategy for XAI

Marcin Rabiza

PDF

TL;DR

This paper proposes a mechanistic explanatory strategy for XAI, emphasizing the identification of functional components in deep learning systems to improve interpretability within a philosophical framework.

Contribution

It introduces a mechanistic approach to explain deep neural networks, integrating philosophical insights with practical case studies from image and language models.

Findings

01

Mechanistic explanations can reveal overlooked elements in AI systems.

02

Decomposition and localization aid in understanding neural network functions.

03

Case studies demonstrate the approach's alignment with interpretability research.

Abstract

Despite significant advancements in XAI, scholars note a persistent lack of solid conceptual foundations and integration with broader scientific discourse on explanation. In response, emerging research draws on explanatory strategies from various sciences and the philosophy of science literature to fill these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent developments in explainable AI within a broader philosophical context. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision making. For deep neural networks, this means discerning functionally relevant components, such as neurons, layers, circuits, or activation patterns, and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries

MethodsALIGN