Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition
Mohammed Mudassir Uddin, Shahnawaz Alam, Mohammed Kaif Pasha

TL;DR
This paper introduces HAGD, a scalable framework for extracting sparse, interpretable circuits from large language models by reducing search complexity and validating through causal interventions.
Contribution
The paper presents a novel hierarchical attribution graph decomposition method that significantly improves the efficiency of circuit extraction from billion-parameter models.
Findings
Achieves up to 91% behavioral preservation on modular arithmetic tasks.
Discovered circuits show 67% structural similarity across different model architectures.
Framework scales to models like GPT-2, Llama, and Pythia, demonstrating broad applicability.
Abstract
Mechanistic interpretability seeks to reverse-engineer neural network computations into human-understandable algorithms, yet extracting sparse computational circuits from billion-parameter language models remains challenging due to exponential search complexity and pervasive polysemanticity. The proposed Hierarchical Attribution Graph Decomposition (HAGD) framework reduces circuit discovery complexity from O(2^n) exhaustive enumeration to O(n^2 log n) through multi-resolution abstraction hierarchies and differentiable circuit search. The methodology integrates cross-layer transcoders for monosemantic feature extraction, graph neural network meta-learning for topology prediction, and causal intervention protocols for validation. Empirical evaluation spans GPT-2 variants, Llama-7B through Llama-70B, and Pythia suite models across algorithmic tasks and natural language benchmarks. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
