SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis

Zihao Fu; Xufeng Duan; Zhenguang G. Cai

arXiv:2601.07411·cs.LG·January 13, 2026

SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis

Zihao Fu, Xufeng Duan, Zhenguang G. Cai

PDF

Open Access

TL;DR

SCALPEL introduces a low-rank parameter editing framework to interpret and selectively remove capabilities in large language models, revealing their distributed and fine-grained encoding across model parameters.

Contribution

This work proposes representing capabilities as low-rank subspaces, enabling precise capability ablation without affecting other functionalities, advancing interpretability of LLMs.

Findings

01

Capabilities exhibit low-rank structure in parameter space.

02

Targeted low-rank modifications can selectively remove specific capabilities.

03

SCALPEL preserves general language modeling quality while ablating targeted capabilities.

Abstract

Large language models excel across diverse domains, yet their deployment in healthcare, legal systems, and autonomous decision-making remains limited by incomplete understanding of their internal mechanisms. As these models integrate into high-stakes systems, understanding how they encode capabilities has become fundamental to interpretability research. Traditional approaches identify important modules through gradient attribution or activation analysis, assuming specific capabilities map to specific components. However, this oversimplifies neural computation: modules may contribute to multiple capabilities simultaneously, while single capabilities may distribute across multiple modules. These coarse-grained analyses fail to capture fine-grained, distributed capability encoding. We present SCALPEL (Selective Capability Ablation via Low-rank Parameter Editing for Large language models),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education