EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Lin Zhang, Wenshuo Dong, Zhuoran Zhang, Shu Yang, Lijie Hu, Ninghao, Liu, Pan Zhou, and Di Wang

TL;DR
This paper introduces EAP-GP, a new method to improve gradient-based circuit identification in neural networks by addressing saturation effects, leading to more reliable and faithful circuit discovery in transformer models.
Contribution
EAP-GP employs an adaptive integration path to mitigate saturation, significantly enhancing the accuracy and reliability of circuit identification in language models.
Findings
EAP-GP outperforms existing methods with up to 17.7% improvement in faithfulness.
EAP-GP achieves comparable or better precision and recall than manual ground-truth annotations.
Experimental validation on 6 datasets with GPT-2 models confirms its effectiveness.
Abstract
Understanding the internal mechanisms of transformer-based language models remains challenging. Mechanistic interpretability based on circuit discovery aims to reverse engineer neural networks by analyzing their internal processes at the level of computational subgraphs. In this paper, we revisit existing gradient-based circuit identification methods and find that their performance is either affected by the zero-gradient problem or saturation effects, where edge attribution scores become insensitive to input changes, resulting in noisy and unreliable attribution evaluations for circuit components. To address the saturation effect, we propose Edge Attribution Patching with GradPath (EAP-GP), EAP-GP introduces an integration path, starting from the input and adaptively following the direction of the difference between the gradients of corrupted and clean inputs to avoid the saturated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntegrated Circuits and Semiconductor Failure Analysis · VLSI and Analog Circuit Testing · Advancements in Semiconductor Devices and Circuit Design
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Dense Connections · Linear Layer · Multi-Head Attention · Adam · Softmax · Dropout · Weight Decay
