A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering

Nobel Dhar; Bobin Deng; Md Romyull Islam; Xinyue Zhang; Kazi Fahim Ahmad Nasif; and Kun Suo

arXiv:2507.14179·cs.LG·July 22, 2025

A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering

Nobel Dhar, Bobin Deng, Md Romyull Islam, Xinyue Zhang, Kazi Fahim Ahmad Nasif, and Kun Suo

PDF

TL;DR

This paper introduces a clustering-based method to predict activation patterns in large language models, significantly reducing computational costs while maintaining high model performance.

Contribution

It proposes a scalable clustering framework for activation pattern compression, outperforming standard methods and enabling efficient inference in large language models.

Findings

01

Achieves up to 79.34% clustering precision.

02

Maintains minimal perplexity degradation, with scores as low as 12.49.

03

Demonstrates potential for efficient sparse computation in LLMs.

Abstract

Large Language Models (LLMs) exhibit significant activation sparsity, where only a subset of neurons are active for a given input. Although this sparsity presents opportunities to reduce computational cost, efficiently utilizing it requires predicting activation patterns in a scalable manner. However, direct prediction at the neuron level is computationally expensive due to the vast number of neurons in modern LLMs. To enable efficient prediction and utilization of activation sparsity, we propose a clustering-based activation pattern compression framework. Instead of treating each neuron independently, we group similar activation patterns into a small set of representative clusters. Our method achieves up to 79.34% clustering precision, outperforming standard binary clustering approaches while maintaining minimal degradation in perplexity (PPL) scores. With a sufficiently large number…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.