First Activations Matter: Training-Free Methods for Dynamic Activation   in Large Language Models

Chi Ma; Mincong Huang; Ying Zhang; Chao Wang; Yujie Wang; Lei Yu,; Chuan Liu; Wei Lin

arXiv:2408.11393·cs.CL·August 22, 2024

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

Chi Ma, Mincong Huang, Ying Zhang, Chao Wang, Yujie Wang, Lei Yu,, Chuan Liu, Wei Lin

PDF

Open Access

TL;DR

This paper proposes a training-free, threshold-based dynamic activation method that leverages sequence information to improve inference efficiency in large language models by exploiting inherent sparsity, with theoretical analysis of key sparsity features.

Contribution

It introduces a novel training-free dynamic activation technique that enhances LLM inference speed and provides theoretical insights into model sparsity.

Findings

01

Accelerates generation speed by 18-25%

02

Maintains task performance with minimal compromise

03

Provides theoretical analysis of activation uncertainty and inertia

Abstract

Dynamic activation (DA) techniques, such as DejaVu and MoEfication, have demonstrated their potential to significantly enhance the inference efficiency of large language models (LLMs). However, these techniques often rely on ReLU activation functions or require additional parameters and training to maintain performance. This paper introduces a training-free Threshold-based Dynamic Activation(TDA) method that leverage sequence information to exploit the inherent sparsity of models across various architectures. This method is designed to accelerate generation speed by 18-25\% without significantly compromising task performance, thereby addressing the limitations of existing DA techniques. Moreover, we delve into the root causes of LLM sparsity and theoretically analyze two of its critical features: history-related activation uncertainty and semantic-irrelevant activation inertia. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

Methods*Communicated@Fast*How Do I Communicate to Expedia? · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings