Mamba-Shedder: Post-Transformer Compression for Efficient Selective   Structured State Space Models

J. Pablo Mu\~noz; Jinjie Yuan; Nilesh Jain

arXiv:2501.17088·cs.LG·January 29, 2025

Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models

J. Pablo Mu\~noz, Jinjie Yuan, Nilesh Jain

PDF

Open Access 1 Repo 3 Models 1 Video

TL;DR

This paper introduces Mamba-Shedder, a compression method for SSM-based models that reduces size and computation with minimal accuracy loss, achieving up to 1.4x speedup during inference.

Contribution

It presents a novel post-Transformer compression technique for SSM models, enhancing efficiency while preserving performance.

Findings

01

Achieves up to 1.4x inference speedup.

02

Reduces model size and computational overhead.

03

Maintains accuracy with component removal.

Abstract

Large pre-trained models have achieved outstanding results in sequence modeling. The Transformer block and its attention mechanism have been the main drivers of the success of these models. Recently, alternative architectures, such as Selective Structured State Space Models (SSMs), have been proposed to address the inefficiencies of Transformers. This paper explores the compression of SSM-based models, particularly Mamba and its hybrids. We study the sensitivity of these models to the removal of selected components at different granularities to reduce the model size and computational overhead, thus improving their efficiency while maintaining accuracy. The proposed solutions, collectively referred to as Mamba-Shedder, achieve a speedup of up to 1.4x during inference, demonstrating that model efficiency can be improved by eliminating several redundancies with minimal impact on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intellabs/hardware-aware-automated-machine-learning
pytorchOfficial

Models

Videos

Mamba-Shedder: Post-Transformer Compression for Efficient Selective Structured State Space Models· underline

Taxonomy

TopicsCopper Interconnects and Reliability · Vibration and Dynamic Analysis

MethodsAttention Is All You Need · Softmax · Adam · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer