MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs
Chi Ma, Mincong Huang, Chao Wang, Yujie Wang, Lei Yu

TL;DR
This paper provides a theoretical analysis of the MOYU property in large language models, identifying key limitations of current dynamic activation methods and suggesting directions for future improvements.
Contribution
It elucidates the root causes of MOYU's limitations and offers insights to refine sparsity schemes in large language models.
Findings
Identifies history-related activation uncertainty as a limitation.
Highlights semantic-irrelevant activation inertia.
Provides theoretical foundations for future sparsity improvements.
Abstract
Massive Over-activation Yielded Uplifts(MOYU) is an inherent property of large language models, and dynamic activation(DA) based on the MOYU property is a clever yet under-explored strategy designed to accelerate inference in these models. Existing methods that utilize MOYU often face a significant 'Impossible Trinity': struggling to simultaneously maintain model performance, enhance inference speed, and extend applicability across various architectures. Due to the theoretical ambiguities surrounding MOYU, this paper elucidates the root cause of the MOYU property and outlines the mechanisms behind two primary limitations encountered by current DA methods: 1) history-related activation uncertainty, and 2) semantic-irrelevant activation inertia. Our analysis not only underscores the limitations of current dynamic activation strategies within large-scale LLaMA models but also proposes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Surface Polishing Techniques · Advanced materials and composites
MethodsLLaMA
