SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis
Zihao Fu, Xufeng Duan, Zhenguang G. Cai

TL;DR
SCALPEL introduces a low-rank parameter editing framework to interpret and selectively remove capabilities in large language models, revealing their distributed and fine-grained encoding across model parameters.
Contribution
This work proposes representing capabilities as low-rank subspaces, enabling precise capability ablation without affecting other functionalities, advancing interpretability of LLMs.
Findings
Capabilities exhibit low-rank structure in parameter space.
Targeted low-rank modifications can selectively remove specific capabilities.
SCALPEL preserves general language modeling quality while ablating targeted capabilities.
Abstract
Large language models excel across diverse domains, yet their deployment in healthcare, legal systems, and autonomous decision-making remains limited by incomplete understanding of their internal mechanisms. As these models integrate into high-stakes systems, understanding how they encode capabilities has become fundamental to interpretability research. Traditional approaches identify important modules through gradient attribution or activation analysis, assuming specific capabilities map to specific components. However, this oversimplifies neural computation: modules may contribute to multiple capabilities simultaneously, while single capabilities may distribute across multiple modules. These coarse-grained analyses fail to capture fine-grained, distributed capability encoding. We present SCALPEL (Selective Capability Ablation via Low-rank Parameter Editing for Large language models),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
