Achieving Sparse Activation in Small Language Models

Jifeng Song; Kai Huang; Xiangyu Yin; Boyuan Yang; Wei Gao

arXiv:2406.06562·cs.CL·June 12, 2024

Achieving Sparse Activation in Small Language Models

Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel attribution metric for sparse activation in small language models, enabling 80% neuron sparsification with minimal accuracy loss, thus reducing computational costs without retraining.

Contribution

It proposes a new attribution metric that corrects errors in existing methods, allowing effective sparse activation in small language models.

Findings

01

Achieves 80% sparsification with less than 5% accuracy loss.

02

Existing attribution metrics have large errors due to neuron interdependencies.

03

The proposed metric improves sparse activation precision in SLMs.

Abstract

Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In this paper, we aim to achieve sparse activation in SLMs. We first show that the existing sparse activation schemes in LLMs that build on neurons' output magnitudes cannot be applied to SLMs, and activating neurons based on their attribution scores is a better alternative. Further, we demonstrated and quantified the large errors of existing attribution metrics when being used for sparse activation, due to the interdependency among attribution scores of neurons across different layers. Based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pittisl/sparse-activation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training