Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Ameen Ali; Shahar Katz; Lior Wolf; Ivan Titov

arXiv:2507.09185·cs.CL·July 15, 2025

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models

Ameen Ali, Shahar Katz, Lior Wolf, Ivan Titov

PDF

Open Access

TL;DR

This paper presents a fine-tuning method that improves large language models' generalization by identifying and pruning neurons responsible for dataset-specific mechanisms, leading to better performance on diverse tasks.

Contribution

It introduces a neuron pruning technique using Integrated Gradients to enhance LLM generalization by removing dataset-specific neurons during fine-tuning.

Findings

01

Pruning improves model performance on multiple-choice benchmarks.

02

The method surpasses previous non-pruning adaptation techniques.

03

Pruning reduces reliance on dataset-specific correlations.

Abstract

Large language models (LLMs) often develop learned mechanisms specialized to specific datasets, such as reliance on domain-specific correlations, which yield high-confidence predictions without generalizable reasoning. While beneficial in one setting, these dataset-specific mechanisms typically degrade performance when models encounter novel tasks or distributions. In this work, we introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms in transformer-based LLMs. Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance without supporting robust, transferable reasoning. Selectively pruning these neurons compels the model to depend on generalizable representations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques