
TL;DR
This paper proposes a mechanistic explanatory strategy for XAI, emphasizing the identification of functional components in deep learning systems to improve interpretability within a philosophical framework.
Contribution
It introduces a mechanistic approach to explain deep neural networks, integrating philosophical insights with practical case studies from image and language models.
Findings
Mechanistic explanations can reveal overlooked elements in AI systems.
Decomposition and localization aid in understanding neural network functions.
Case studies demonstrate the approach's alignment with interpretability research.
Abstract
Despite significant advancements in XAI, scholars note a persistent lack of solid conceptual foundations and integration with broader scientific discourse on explanation. In response, emerging research draws on explanatory strategies from various sciences and the philosophy of science literature to fill these gaps. This paper outlines a mechanistic strategy for explaining the functional organization of deep learning systems, situating recent developments in explainable AI within a broader philosophical context. According to the mechanistic approach, the explanation of opaque AI systems involves identifying mechanisms that drive decision making. For deep neural networks, this means discerning functionally relevant components, such as neurons, layers, circuits, or activation patterns, and understanding their roles through decomposition, localization, and recomposition. Proof-of-principle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries
MethodsALIGN
