Skill Path: Unveiling Language Skills from Circuit Graphs
Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang

TL;DR
This paper introduces skill paths, a refined method for extracting and isolating specific language skills from circuit graphs in language models, addressing limitations of previous approaches.
Contribution
It proposes a three-step framework for extracting skill paths that improve interpretability and disentanglement of language model skills from circuit graphs.
Findings
Skill paths effectively isolate individual language skills.
Experiments demonstrate stratification and inclusiveness of skills.
Framework enhances interpretability of circuit graphs.
Abstract
Circuit graph discovery has emerged as a fundamental approach to elucidating the skill mechanistic of language models. Despite the output faithfulness of circuit graphs, they suffer from atomic ablation, which causes the loss of causal dependencies between connected components. In addition, their discovery process, designed to preserve output faithfulness, inadvertently captures extraneous effects other than an isolated target skill. To alleviate these challenges, we introduce skill paths, which offers a more refined and compact representation by isolating individual skills within a linear chain of components. To enable skill path extracting from circuit graphs, we propose a three-step framework, consisting of decomposition, pruning, and post-pruning causal mediation. In particular, we offer a complete linear decomposition of the transformer model which leads to a disentangled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSecond Language Learning and Teaching · EFL/ESL Teaching and Learning · Multilingual Education and Policy
MethodsFocus · Pruning
