Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition

Mohammed Mudassir Uddin; Shahnawaz Alam; Mohammed Kaif Pasha

arXiv:2601.12879·cs.LG·January 21, 2026

Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition

Mohammed Mudassir Uddin, Shahnawaz Alam, Mohammed Kaif Pasha

PDF

Open Access

TL;DR

This paper introduces HAGD, a scalable framework for extracting sparse, interpretable circuits from large language models by reducing search complexity and validating through causal interventions.

Contribution

The paper presents a novel hierarchical attribution graph decomposition method that significantly improves the efficiency of circuit extraction from billion-parameter models.

Findings

01

Achieves up to 91% behavioral preservation on modular arithmetic tasks.

02

Discovered circuits show 67% structural similarity across different model architectures.

03

Framework scales to models like GPT-2, Llama, and Pythia, demonstrating broad applicability.

Abstract

Mechanistic interpretability seeks to reverse-engineer neural network computations into human-understandable algorithms, yet extracting sparse computational circuits from billion-parameter language models remains challenging due to exponential search complexity and pervasive polysemanticity. The proposed Hierarchical Attribution Graph Decomposition (HAGD) framework reduces circuit discovery complexity from O(2^n) exhaustive enumeration to O(n^2 log n) through multi-resolution abstraction hierarchies and differentiable circuit search. The methodology integrates cross-layer transcoders for monosemantic feature extraction, graph neural network meta-learning for topology prediction, and causal intervention protocols for validation. Empirical evaluation spans GPT-2 variants, Llama-7B through Llama-70B, and Pythia suite models across algorithmic tasks and natural language benchmarks. On…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning