Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar, Alexander Wettig, Dan Friedman, Danqi Chen

TL;DR
This paper introduces Edge Pruning, a scalable gradient-based method for discovering sparse, interpretable circuits in large language models like GPT-2 and CodeLlama-13B, improving efficiency and fidelity over prior approaches.
Contribution
The paper presents Edge Pruning, a novel optimization-based approach for automated circuit discovery that significantly reduces edge count while maintaining model fidelity, enabling analysis of larger models.
Findings
Edge Pruning finds circuits with less than half the edges of previous methods.
It scales to models over 100 times larger than prior methods.
The method perfectly recovers ground-truth circuits in compiled models.
Abstract
The path to interpreting a language model often proceeds via analysis of circuits -- sparse computational subgraphs of the model that capture specific aspects of its behavior. Recent work has automated the task of discovering circuits. Yet, these methods have practical limitations, as they rely either on inefficient search algorithms or inaccurate approximations. In this paper, we frame automated circuit discovery as an optimization problem and propose *Edge Pruning* as an effective and scalable solution. Edge Pruning leverages gradient-based pruning techniques, but instead of removing neurons or components, it prunes the \emph{edges} between components. Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods while being equally faithful to the full model predictions on standard circuit-finding tasks. Edge Pruning is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMagnetic Properties and Applications · Advanced Electrical Measurement Techniques · Sensor Technology and Measurement Systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Discriminative Fine-Tuning · Weight Decay · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Layer Normalization · Byte Pair Encoding
