MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Masakazu Yoshimura; Teruaki Hayashi; Yota Maeda

arXiv:2411.03855·cs.CL·April 2, 2025

MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba

Masakazu Yoshimura, Teruaki Hayashi, Yota Maeda

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper explores parameter-efficient fine-tuning (PEFT) methods for Mamba, a state space model alternative to Transformers, demonstrating improved adaptation to downstream tasks and proposing Mamba-specific PEFT techniques.

Contribution

It introduces Mamba-specific PEFT methods, modifies existing PEFT techniques for Mamba, and provides a framework that outperforms previous approaches in adapting Mamba models.

Findings

01

PEFT is more effective for Mamba than Transformers.

02

Modified PEFT methods improve Mamba adaptation.

03

Combining multiple PEFT methods yields superior performance.

Abstract

An ecosystem of Transformer-based models has been established by building large models with extensive data. Parameter-efficient fine-tuning (PEFT) is a crucial technology for deploying these models to downstream tasks with minimal cost while achieving effective performance. Recently, Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Transformers. While many large-scale Mamba-based models have been proposed, efficiently adapting pre-trained Mamba-based models to downstream tasks remains unexplored. In this paper, we conduct an exploratory analysis of PEFT methods for Mamba. We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba. We also modify these methods to better align with the Mamba architecture. Additionally, we propose new Mamba-specific PEFT methods that leverage the distinctive structure…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 4

Strengths

1. The paper not only improves existing PEFT algorithms to suit Mamba but also proposes targeted, innovative algorithms, providing the community with a broader selection of PEFT methods. 2. The experiments comprehensively showcase the performance of each PEFT algorithm, offering a framework for optimal algorithmic combinations. 3. The extensive experimental workload demonstrates thorough exploration of the algorithms.

Weaknesses

1. The improved and original PEFT algorithms did not achieve the best results. In the VTAB-1k benchmark, the performance of the improved Partial LoRA is nearly identical to that of LoRA, while Additional-scan underperforms ParallelAdapter with comparable training parameters. Compared to traditional PEFT algorithms, these methods lack strong competitive advantage. 2. Although LoRA demonstrates outstanding performance on the VTAB-1k benchmark, its comparison with the modified Partial LoRA on langu

Reviewer 02Rating 8Confidence 5

Strengths

1. The paper focuses on a novel and competitive model architecture, SSMs, systematically analyzing the applicability of existing PEFT methods for the Mamba model, offering insights for designing efficient fine-tuning schemes for SSMs. 2. The paper adapts existing PEFT methods specifically for SSMs, proposing new approaches tailored for Mamba, achieving good performances. 3. The extensive experiments, covering both image and text modalities, provide comprehensive benchmarks for all methods and re

Weaknesses

1. The paper covers too many aspects, including various PEFT methods, individual analyses, and hybrid architecture search. The authors should concentrate on one aspect and provide valuable conclusions for each part instead of describing each in detail, as this could reduce readability and divert focus. 2. The paper should consider including the most advanced works within each PEFT category, such as GPS[1] and SPT[2] for partial-tuning methods, and SSF[3] in addition to LoRA-based methods for rep

Reviewer 03Rating 6Confidence 4

Strengths

- The paper is well-written and easy to follow. - Searching for diverse PEFT options for Mamba is informative and would be a valuable contribution for those who employ Mamba architectures. - Studying extensive PEFT options based on a baseline yields convincing results.

Weaknesses

#### # Main concerns - The authors' claim in line 22 that PEFT is more effective for Mamba than for Transformers lacks adequate rationale and supporting evidence. There is no clear intuition as to why Mamba would benefit more from PEFT; the experiments appear biased due to an unfair comparison - specifically, Transformers were tested with only a limited range of PEFT options, while Mamba underwent more comprehensive testing (as illustrated in Table 1). - The authors primarily tested Vision Mamb

Code & Models

Repositories

sony/MambaPEFT
pytorchOfficial

Videos

MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba· slideslive

Taxonomy

TopicsArchitecture and Computational Design

MethodsSoftmax · Attention Is All You Need · ALIGN · Mamba: Linear-Time Sequence Modeling with Selective State Spaces