Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models
Ameen Ali, Shahar Katz, Lior Wolf, Ivan Titov

TL;DR
This paper presents a fine-tuning method that improves large language models' generalization by identifying and pruning neurons responsible for dataset-specific mechanisms, leading to better performance on diverse tasks.
Contribution
It introduces a neuron pruning technique using Integrated Gradients to enhance LLM generalization by removing dataset-specific neurons during fine-tuning.
Findings
Pruning improves model performance on multiple-choice benchmarks.
The method surpasses previous non-pruning adaptation techniques.
Pruning reduces reliance on dataset-specific correlations.
Abstract
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets, such as reliance on domain-specific correlations, which yield high-confidence predictions without generalizable reasoning. While beneficial in one setting, these dataset-specific mechanisms typically degrade performance when models encounter novel tasks or distributions. In this work, we introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms in transformer-based LLMs. Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance without supporting robust, transferable reasoning. Selectively pruning these neurons compels the model to depend on generalizable representations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
