SparseInfer: Training-free Prediction of Activation Sparsity for Fast   LLM Inference

Jiho Shin; Hoeseok Yang; Youngmin Yi

arXiv:2411.12692·cs.PF·January 27, 2025

SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference

Jiho Shin, Hoeseok Yang, Youngmin Yi

PDF

Open Access

TL;DR

SparseInfer is a training-free predictor that estimates activation sparsity in ReLU-based large language models, enabling faster inference with minimal accuracy loss by comparing sign bits of inputs and weights.

Contribution

It introduces a lightweight, training-free method for predicting activation sparsity, improving inference speed without degrading model accuracy.

Findings

01

Achieves faster inference speed compared to state-of-the-art methods.

02

Maintains accuracy loss within 1 percentage point.

03

Uses sign bit comparison for prediction, simplifying the process.

Abstract

Leveraging sparsity is crucial for optimizing large language model inference. however, modern LLMs employing SiLU as their activation function exhibit minimal activation sparsity. Recent research has proposed replacing SiLU with ReLU to induce significant activation sparsity and showed no downstream task accuracy degradation through fine tuning. However, taking full advantage of it required training a predictor to estimate this sparsity. In this paper, we introduce SparseInfer, a simple, light weight, and training free predictor for activation sparsity of ReLU field LLMs, in which activation sparsity is predicted by comparing only the sign bits of inputs and weights. To compensate for possible prediction inaccuracy, an adaptive tuning of the predictor's conservativeness is enabled, which can also serve as a control knob for optimizing LLM inference. The proposed method achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling