Identifying Good and Bad Neurons for Task-Level Controllable LLMs

Wenjie Li; Guansong Pang; Hezhe Qiao; Debin Gao; David Lo

arXiv:2601.04548·cs.CL·March 6, 2026

Identifying Good and Bad Neurons for Task-Level Controllable LLMs

Wenjie Li, Guansong Pang, Hezhe Qiao, Debin Gao, David Lo

PDF

Open Access 4 Reviews

TL;DR

This paper introduces NeuronLLM, a framework that identifies both supportive and inhibitive neurons in large language models to better understand and control their task performance.

Contribution

NeuronLLM applies the biological concept of functional antagonism and contrastive learning to holistically model neurons, addressing limitations of previous methods focused only on positive roles.

Findings

01

NeuronLLM outperforms existing methods across multiple NLP tasks.

02

It provides new insights into the functional organization of neurons in LLMs.

03

The framework is effective across different model sizes and architectures.

Abstract

Large Language Models have demonstrated remarkable capabilities on multiple-choice question answering benchmarks, but the complex mechanisms underlying their large-scale neurons remain opaque, posing significant challenges for understanding and steering LLMs. While recent studies made progress on identifying responsible neurons for certain abilities, these ability-specific methods are infeasible for task-focused scenarios requiring coordinated use of multiple abilities. Moreover, these approaches focus only on supportive neurons that correlate positively with task completion, while neglecting neurons with other roles-such as inhibitive roles-and misled neuron attribution due to fortuitous behaviors in LLMs (i.e., correctly answer the questions by chance rather than genuine understanding). To address these challenges, we propose NeuronLLM, a novel task-level LLM understanding framework…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 5

Strengths

* this work is easy to follow and the motivation is clearly stated * identifying both good and bad neurons is an interesting strategy

Weaknesses

* this work is highly similar to an existing work, QR-Neuron [1], which severely undermines the novelty and contribution of this manuscript. (1) QR-Neuron is the first work that introduced multi-choice QA for neuron analyses, and this work claims that they propose this strategy and didn't give proper credit to prior work; (2) the proposed QATT also follows the core idea of the QR-Neuron work. I would suggest clarifying the distinction and novelty of QATT * the authors claim that *"Cross-Entropy-

Reviewer 02Rating 4Confidence 4

Strengths

- *Novel Conceptual Insight*: The introduction of “bad” (inhibitory) neurons alongside “good” ones, inspired by functional antagonism, provides a more holistic and biologically plausible view of LLM internal mechanisms. - *Practical Framework Design**: The two-stage design (QATT + CNI) is well-motivated. QATT effectively standardizes diverse tasks, enabling consistent neuron analysis, while the ACE scoring in CNI naturally fits the QA format and captures both positive and negative contributions.

Weaknesses

- *Limited Scope of Tasks*: While the framework is proposed for task-level understanding, the evaluated tasks are all classification or multi-choice QA after transformation. It remains unclear how well NeuronLLM generalizes to true *generation* tasks where the output space is open-ended and the “distractor” choices in QATT are not naturally defined. Neuron-Level Knowledge Attribution in Large Language Models (Yu et al., 2024) claims that the active neurons are related to task domains and its for

Reviewer 03Rating 6Confidence 4

Strengths

1.The paper proposes the NeuronLLM framework, which leverages two opposing types of neurons: task-supporting “good” neurons and task-inhibiting “bad” neurons, to achieve overall task-level control of LLMs. 2.The paper introduces a Question-Answering-based Task Transformation module that unifies various tasks into a question-answering format, enabling NeuronLLM to interpret LLMs under different tasks. 3.The paper presents a contrastive neuron identification module, which uses a novel cross-entrop

Weaknesses

1.The paper uses performance change metrics RAC and RCC in the comparative experiments, which do not directly show the effects of Degrade and Enhance, i.e., whether the model’s performance decreases or improves. It is recommended to add metrics that can visually indicate performance increases and decreases. 2.The paper proposes a Question-Answering-based Task Transformation module, which unifies diverse tasks into a QA format to ensure that the neuron attribution method can model a consistent ou

Reviewer 04Rating 4Confidence 4

Strengths

1. Good writing and motivation illustration: This paper highlights the reason why we need for not only activating the supporting neurons but also removing the neurons which declines the performance from both biological and mathematical view.

Weaknesses

1. Lack of innovation: The main method of the paper is based on a biological concept and existing methods for enhancing and reducing activation values, but more often than not, biological concepts are applied to the shell of existing theories, lacking unique innovation. 2. The comparison of baselines lacks fairness: The main models you compared are Llama2-7B, Llama2-13B, and Baichuan2-7B. Why not use some newer and better open-source models, such as Qwen, Llama3, and other series of models. And

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks