Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study

Chi Ma; Mincong Huang; Chao Wang; Yujie Wang; Lei Yu

arXiv:2405.09274·cs.LG·May 16, 2024

Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study

Chi Ma, Mincong Huang, Chao Wang, Yujie Wang, Lei Yu

PDF

Open Access

TL;DR

This paper empirically investigates dynamic activation mechanisms in LLaMA models, revealing significant pitfalls and underperformance issues compared to static ReLU activations, especially in high sparsity scenarios.

Contribution

It systematically analyzes the limitations of current dynamic activation schemes in LLaMA models and proposes directions for improving future sparsity strategies.

Findings

01

Dynamic activation often underperforms ReLU in LLaMA models.

02

Complexity of predicting activation components hampers effectiveness.

03

Inadequate sparsity and information loss affect model performance.

Abstract

In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Despite the potential of dynamic activation methods to reduce computation and increase speed in models using the ReLU activation function, our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes. Through extensive experiments across various dynamic activation strategies, we demonstrate that LLaMA models usually underperform when compared to their ReLU counterparts, particularly in scenarios demanding high sparsity ratio. We attribute these deficiencies to a combination of factors: 1) the inherent complexity of dynamically predicting activation heads and neurons; 2) the inadequate sparsity resulting from activation functions; 3) the insufficient preservation of information resulting from KV cache skipping.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · LLaMA