Auxiliary Metrics Help Decoding Skill Neurons in the Wild
Yixiu Zhao, Xiaozhi Wang, Zijun Yao, Lei Hou, Juanzi Li

TL;DR
This paper presents a lightweight method that uses auxiliary metrics to identify and interpret skill neurons in large language models across various tasks, revealing both known and hidden capabilities.
Contribution
The authors extend prior skill neuron identification techniques by correlating neuron activations with auxiliary metrics, enabling analysis in complex multi-skill scenarios without manual token aggregation.
Findings
Effectively detects skill neurons in open-ended text generation and inference tasks.
Uncovers previously unknown shortcuts in arithmetic reasoning.
Demonstrates broad applicability across diverse NLP tasks.
Abstract
Large language models (LLMs) exhibit remarkable capabilities across a wide range of tasks, yet their internal mechanisms remain largely opaque. In this paper, we introduce a simple, lightweight, and broadly applicable method with a focus on isolating neurons that encode specific skills. Building upon prior work that identified "skill neurons" via soft prompt training on classification tasks, our approach extends the analysis to complex scenarios involving multiple skills. We correlate neuron activations with auxiliary metrics -- such as external labels and the model's own confidence score -- thereby uncovering interpretable and task-specific behaviors without the need for manual token aggregation. We empirically validate our method on tasks spanning open-ended text generation and natural language inference, demonstrating its ability to detect neurons that not only drive known skills but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
