SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference
Jiho Shin, Hoeseok Yang, Youngmin Yi

TL;DR
SparseInfer is a training-free predictor that estimates activation sparsity in ReLU-based large language models, enabling faster inference with minimal accuracy loss by comparing sign bits of inputs and weights.
Contribution
It introduces a lightweight, training-free method for predicting activation sparsity, improving inference speed without degrading model accuracy.
Findings
Achieves faster inference speed compared to state-of-the-art methods.
Maintains accuracy loss within 1 percentage point.
Uses sign bit comparison for prediction, simplifying the process.
Abstract
Leveraging sparsity is crucial for optimizing large language model inference. however, modern LLMs employing SiLU as their activation function exhibit minimal activation sparsity. Recent research has proposed replacing SiLU with ReLU to induce significant activation sparsity and showed no downstream task accuracy degradation through fine tuning. However, taking full advantage of it required training a predictor to estimate this sparsity. In this paper, we introduce SparseInfer, a simple, light weight, and training free predictor for activation sparsity of ReLU field LLMs, in which activation sparsity is predicted by comparing only the sign bits of inputs and weights. To compensate for possible prediction inaccuracy, an adaptive tuning of the predictor's conservativeness is enabled, which can also serve as a control knob for optimizing LLM inference. The proposed method achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
