Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs

Sofiia Chorna; Kateryna Tarelkina; Elo\"ise Berthier; Gianni Franchi

arXiv:2507.05810·cs.LG·July 9, 2025

Concept-Based Mechanistic Interpretability Using Structured Knowledge Graphs

Sofiia Chorna, Kateryna Tarelkina, Elo\"ise Berthier, Gianni Franchi

PDF

Open Access

TL;DR

This paper introduces BAGEL, a knowledge graph-based framework for global mechanistic interpretability of neural networks, revealing how semantic concepts emerge and interact within models to improve understanding and trust.

Contribution

It presents a novel, scalable framework and visualization tool that extends concept-based interpretability into mechanistic insights using structured knowledge graphs.

Findings

01

Reveals how high-level concepts propagate through model layers

02

Identifies latent circuits and interactions underlying decisions

03

Helps detect spurious correlations and biases

Abstract

While concept-based interpretability methods have traditionally focused on local explanations of neural network predictions, we propose a novel framework and interactive tool that extends these methods into the domain of mechanistic interpretability. Our approach enables a global dissection of model behavior by analyzing how high-level semantic attributes (referred to as concepts) emerge, interact, and propagate through internal model components. Unlike prior work that isolates individual neurons or predictions, our framework systematically quantifies how semantic concepts are represented across layers, revealing latent circuits and information flow that underlie model decision-making. A key innovation is our visualization platform that we named BAGEL (for Bias Analysis with a Graph for global Explanation Layers), which presents these insights in a structured knowledge graph, allowing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning