TL;DR
This paper introduces NAG-based Ranking, a target-oriented data selection method for language models that uses neuron impact analysis to improve pretraining effectiveness and interpretability.
Contribution
The authors propose a novel neuron impact-based framework for target data selection that outperforms existing methods and provides interpretability of the pretraining process.
Findings
NAG improves target-oriented pretraining by 4.9% on average across six benchmarks.
NAG outperforms state-of-the-art baselines by 5.3% accuracy on HellaSwag.
Deactivating NAG-selected neurons causes a 23.5% performance drop, showing their importance.
Abstract
Everyday tasks come with a target, and pretraining models around this target is what turns them into experts. In this paper, we study target-oriented language model (LM) pretraining by introducing Neuron-Activated Graph Ranking (NAG-based Ranking), a training-free and interpretable framework for target pretraining data selection. Rather than using black-box representations, our approach directly characterizes each target input by a sparse set of high-impact neurons in any off-the-shelf LLMs. Concretely, we quantify neuron impact and select the most influential neurons across layers into a compact Neuron-Activated Graph (NAG), and rank candidate data by NAG similarity to target examples. We conduct experiments across six benchmarks, where our NAG-based Ranking improves target-oriented pretraining by 4.9% on average over random sampling, and also outperforms state-of-the-art baselines by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
