Achieving Sparse Activation in Small Language Models
Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao

TL;DR
This paper introduces a novel attribution metric for sparse activation in small language models, enabling 80% neuron sparsification with minimal accuracy loss, thus reducing computational costs without retraining.
Contribution
It proposes a new attribution metric that corrects errors in existing methods, allowing effective sparse activation in small language models.
Findings
Achieves 80% sparsification with less than 5% accuracy loss.
Existing attribution metrics have large errors due to neuron interdependencies.
The proposed metric improves sparse activation precision in SLMs.
Abstract
Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In this paper, we aim to achieve sparse activation in SLMs. We first show that the existing sparse activation schemes in LLMs that build on neurons' output magnitudes cannot be applied to SLMs, and activating neurons based on their attribution scores is a better alternative. Further, we demonstrated and quantified the large errors of existing attribution metrics when being used for sparse activation, due to the interdependency among attribution scores of neurons across different layers. Based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
