Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study
Chi Ma, Mincong Huang, Chao Wang, Yujie Wang, Lei Yu

TL;DR
This paper empirically investigates dynamic activation mechanisms in LLaMA models, revealing significant pitfalls and underperformance issues compared to static ReLU activations, especially in high sparsity scenarios.
Contribution
It systematically analyzes the limitations of current dynamic activation schemes in LLaMA models and proposes directions for improving future sparsity strategies.
Findings
Dynamic activation often underperforms ReLU in LLaMA models.
Complexity of predicting activation components hampers effectiveness.
Inadequate sparsity and information loss affect model performance.
Abstract
In this work, we systematically investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Despite the potential of dynamic activation methods to reduce computation and increase speed in models using the ReLU activation function, our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes. Through extensive experiments across various dynamic activation strategies, we demonstrate that LLaMA models usually underperform when compared to their ReLU counterparts, particularly in scenarios demanding high sparsity ratio. We attribute these deficiencies to a combination of factors: 1) the inherent complexity of dynamically predicting activation heads and neurons; 2) the inadequate sparsity resulting from activation functions; 3) the insufficient preservation of information resulting from KV cache skipping.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · LLaMA
