Post-Training Statistical Calibration for Higher Activation Sparsity

Vui Seng Chua; Yujie Pan; Nilesh Jain

arXiv:2412.07174·cs.LG·December 11, 2024

Post-Training Statistical Calibration for Higher Activation Sparsity

Vui Seng Chua, Yujie Pan, Nilesh Jain

PDF

Open Access 1 Repo

TL;DR

SCAP is a post-training activation pruning method that enhances sparsity and speed in large language models by calibrating activation distributions, achieving significant efficiency gains across various Transformer architectures.

Contribution

Introduces a novel post-training calibration framework, SCAP, that generalizes activation sparsification for Transformers and improves decoding speed without retraining.

Findings

01

Achieves 1.5x speedup over CATS at same model quality.

02

Effectively applied across diverse Transformer models including MoE and pre-quantized models.

03

Demonstrates robustness and scalability of the method.

Abstract

We present Statistical Calibrated Activation Pruning (SCAP), a post-training activation pruning framework that (1) generalizes sparsification by input activations of Fully-Connected layers for generic and flexible application across Transformers, and (2) features a simple Mode-Centering technique to pre-calibrate activation distributions for maximizing post-training sparsity. Our results demonstrate robust Pareto efficiency compared to prior methods, translating to a 1.5x additional LLM decoding speedup against CATS at iso model quality. SCAP effectiveness is empirically verified across a wide range of models, including recent Transformer Decoders, MoE, Mamba2, Encoding Transformer, and pre-quantized models, highlighting its practicality and scalability. The code is available at: https://github.com/IntelLabs/SCAP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intellabs/scap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced Measurement and Metrology Techniques · Advanced X-ray and CT Imaging

MethodsAttention Is All You Need · Adam · Dropout · Position-Wise Feed-Forward Layer · Softmax · Dense Connections · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Label Smoothing