Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation
Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim

TL;DR
This paper introduces ProDiaL, a parameter-efficient fine-tuning method for Mamba models that optimizes only the Projectors using diagonal-centric transformations, achieving strong results with less than 1% of parameters.
Contribution
The paper reveals that Projectors are more crucial than SSMs for transfer learning in Mamba and proposes ProDiaL, a novel PEFT method focusing on Projectors without fine-tuning their weights.
Findings
ProDiaL fine-tunes less than 1% of parameters.
Projectors outperform SSMs in transfer learning.
ProDiaL shows strong performance on vision and language Mamba models.
Abstract
Despite the growing interest in Mamba architecture as a potential replacement for Transformer architecture, parameter-efficient fine-tuning (PEFT) approaches for Mamba remain largely unexplored. In our study, we introduce two key insights-driven strategies for PEFT in Mamba architecture: (1) While state-space models (SSMs) have been regarded as the cornerstone of Mamba architecture, then expected to play a primary role in transfer learning, our findings reveal that Projectors -- not SSMs -- are the predominant contributors to transfer learning. (2) Based on our observation, we propose a novel PEFT method specialized to Mamba architecture: Projector-targeted Diagonal-centric Linear Transformation (ProDiaL). ProDiaL focuses on optimizing only the pretrained Projectors for new tasks through diagonal-centric linear transformation matrices, without directly fine-tuning the Projector weights.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Mechanisms and Dynamics · Piezoelectric Actuators and Control · Mechanical Engineering and Vibrations Research
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax
