Activation Sparsity Opportunities for Compressing General Large Language   Models

Nobel Dhar; Bobin Deng; Md Romyull Islam; Kazi Fahim Ahmad Nasif,; Liang Zhao; Kun Suo

arXiv:2412.12178·cs.LG·February 4, 2025

Activation Sparsity Opportunities for Compressing General Large Language Models

Nobel Dhar, Bobin Deng, Md Romyull Islam, Kazi Fahim Ahmad Nasif,, Liang Zhao, Kun Suo

PDF

Open Access

TL;DR

This paper explores activation sparsity as a method to significantly compress large language models on edge devices, achieving around 50% reduction in memory and computation with minimal accuracy loss.

Contribution

It systematically investigates activation sparsity in LLMs, providing a practical guideline for system optimization and demonstrating effective compression of FFN components.

Findings

01

Achieves ~50% memory and computation reduction in FFN components.

02

Negligible accuracy degradation with increased activation sparsity.

03

Provides a system prediction guideline for efficient LLM deployment.

Abstract

Deploying local AI models, such as Large Language Models (LLMs), to edge devices can substantially enhance devices' independent capabilities, alleviate the server's burden, and lower the response time. Owing to these tremendous potentials, many big tech companies have released several lightweight Small Language Models (SLMs) to bridge this gap. However, we still have huge motivations to deploy more powerful (LLMs) AI models on edge devices and enhance their smartness level. Unlike the conventional approaches for AI model compression, we investigate activation sparsity. The activation sparsity method is orthogonal and combinable with existing techniques to maximize the compression rate while maintaining great accuracy. LLMs' Feed-Forward Network (FFN) components, which typically comprise a large proportion of parameters (around 2/3), ensure that our FFN optimizations would have a better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling