Mamba-PTQ: Outlier Channels in Recurrent Large Language Models

Alessandro Pierro; Steven Abreu

arXiv:2407.12397·cs.LG·July 18, 2024

Mamba-PTQ: Outlier Channels in Recurrent Large Language Models

Alessandro Pierro, Steven Abreu

PDF

Open Access

TL;DR

This paper investigates the challenges of quantizing recurrent large language models, specifically Mamba, highlighting activation outliers as a key obstacle and proposing initial steps for outlier-aware quantization to improve deployment efficiency.

Contribution

It is the first to analyze outlier channels in recurrent LLMs like Mamba during post-training quantization and suggests methods to address these outliers.

Findings

01

Activation outliers are a major challenge in quantizing recurrent LLMs.

02

Baseline quantization results are affected by activation outliers.

03

Initial outlier-aware quantization strategies are proposed.

Abstract

Modern recurrent layers are emerging as a promising path toward edge deployment of foundation models, especially in the context of large language models (LLMs). Compressing the whole input sequence in a finite-dimensional representation enables recurrent layers to model long-range dependencies while maintaining a constant inference cost for each token and a fixed memory requirement. However, the practical deployment of LLMs in resource-limited environments often requires further model compression, such as quantization and pruning. While these techniques are well-established for attention-based models, their effects on recurrent layers remain underexplored. In this preliminary work, we focus on post-training quantization for recurrent LLMs and show that Mamba models exhibit the same pattern of outlier channels observed in attention-based LLMs. We show that the reason for the difficulty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus