A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering
Nobel Dhar, Bobin Deng, Md Romyull Islam, Xinyue Zhang, Kazi Fahim Ahmad Nasif, and Kun Suo

TL;DR
This paper introduces a clustering-based method to predict activation patterns in large language models, significantly reducing computational costs while maintaining high model performance.
Contribution
It proposes a scalable clustering framework for activation pattern compression, outperforming standard methods and enabling efficient inference in large language models.
Findings
Achieves up to 79.34% clustering precision.
Maintains minimal perplexity degradation, with scores as low as 12.49.
Demonstrates potential for efficient sparse computation in LLMs.
Abstract
Large Language Models (LLMs) exhibit significant activation sparsity, where only a subset of neurons are active for a given input. Although this sparsity presents opportunities to reduce computational cost, efficiently utilizing it requires predicting activation patterns in a scalable manner. However, direct prediction at the neuron level is computationally expensive due to the vast number of neurons in modern LLMs. To enable efficient prediction and utilization of activation sparsity, we propose a clustering-based activation pattern compression framework. Instead of treating each neuron independently, we group similar activation patterns into a small set of representative clusters. Our method achieves up to 79.34% clustering precision, outperforming standard binary clustering approaches while maintaining minimal degradation in perplexity (PPL) scores. With a sufficiently large number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
