AVSS: Layer Importance Evaluation in Large Language Models via   Activation Variance-Sparsity Analysis

Zichen Song; Yuxin Wu; Sitan Huang; Zhongfeng Kang

arXiv:2411.02117·cs.CL·November 5, 2024

AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis

Zichen Song, Yuxin Wu, Sitan Huang, Zhongfeng Kang

PDF

Open Access

TL;DR

This paper introduces AVSS, a metric combining activation variance and sparsity to evaluate layer importance in large language models, enabling effective model pruning without significant performance loss.

Contribution

It proposes a novel AVSS metric for assessing layer importance in LLMs and demonstrates effective pruning by removing less critical layers while maintaining performance.

Findings

01

Removing the lowest 25% AVSS layers retains over 90% of performance

02

AVSS effectively identifies non-essential layers in LLMs

03

Pruning based on AVSS improves model efficiency

Abstract

The evaluation of layer importance in deep learning has been an active area of research, with significant implications for model optimization and interpretability. Recently, large language models (LLMs) have gained prominence across various domains, yet limited studies have explored the functional importance and performance contributions of individual layers within LLMs, especially from the perspective of activation distribution. In this work, we propose the Activation Variance-Sparsity Score (AVSS), a novel metric combining normalized activation variance and sparsity to assess each layer's contribution to model performance. By identifying and removing approximately the lowest 25% of layers based on AVSS, we achieve over 90% of original model performance across tasks such as question answering, language modeling, and sentiment classification, indicating that these layers may be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling