TL;DR
Memba is a novel parameter-efficient fine-tuning method for Mamba, a state space model, that uses bio-inspired LIM neurons to enhance temporal processing and improve performance on language and vision tasks.
Contribution
Memba introduces LIM neurons and cross-layer membrane transfer to adapt PEFT specifically for Mamba, addressing its unique temporal dynamics.
Findings
Significant performance improvements over existing PEFT methods.
Effective across both language and vision tasks.
Code availability facilitates reproducibility.
Abstract
State Space Models (SSMs) have emerged as powerful alternatives to attention-based Transformers, with Mamba demonstrating impressive efficiency and scalability. As these models grow increasingly larger, the need for Parameter-Efficient Fine-Tuning (PEFT) methods becomes critical to adapt pre-trained Mamba to downstream tasks without prohibitive computational costs. However, previous approaches simply apply traditional Transformer-tailored PEFT methods without addressing the unique temporal processing dynamics of SSMs. To address this limitation, we propose Memba, a membrane-driven PEFT approach specifically designed for Mamba. Memba introduces Leaky Integrate Membrane (LIM) neurons as bio-inspired gating mechanisms that naturally accumulate membrane potentials over time, enhancing selective information retention. By strategically combining LIM neurons with Low-Rank Adaptations (LoRA)…
Peer Reviews
Decision·ICLR 2026 Poster
1. The LIM neuron brings biologically inspired temporal processing into Mamba’s gating, filling the gap left by its simpler gates and helping the model learn what to remember or forget over time. 2. The paper backs this up with clear analysis (like loss decomposition and bounded regularization), explaining how LIM adapts over time and smooths the loss landscape. 3. The writing and figure are both well illustrated.
1. The performance depends on tuning things like the number of chunks (T), the leak factor (τ), and the threshold (Vth), so LIM likely need task-specific tuning. 2. The theoretical analysis shows that the LIM neuron acts as a regularizer to smooth the loss landscape. However, the paper fails to explain the causal link between this optimization-level effect and the claimed practical benefits of enhanced temporal modeling and selective attention. 3. The evaluation's focus on classification-style b
1. PEFT for Mamba is a quite novel topic and the temporal-wise gating LIM seems to be a good design for finetuning the gate values in SSM. 2. The experiments are wide-ranging—spanning both language and vision tasks. The analysis results are also abundant and convincing. In addition to experiments, the theoretical analysis of loss boundaries is also interesting, enhancing the insight of the design, with the lower loss. 3. The proposed LIM is efficient, with minimal additional parameters compare
1. How is the proposed by-pass of SSM in low rank adapters training related to the biological term, Membrane? The paper does not include any references or tutorials about the membrane mechanism or other background knowledge about it, therefore it is somehow confusing why the proposed mechanism resembles a membrane. 2. The average membrane potentional values in the figures show steps of decrease across the temporal chunks, and the authors attribute this phenomenon into a "forget" manner of LIM.
- The use of LIM neurons for temporal gating of the output of SSM is innovative and outperforms existing methods. The motivation of the cross-layer membrane potential transfer is also reasonable. - Extensive experiments on both language (commonsense reasoning) and vision (VTAB-1k) tasks prove that Memba outperforms existing PEFT methods, including LoRA and other recent SSM-specific PEFTs.
#### Major Weakness - From Table 7, it seems the computational cost in terms of speed is more than the increase in parameter counts. Hence, the speed of Memba seems slower than other PEFT methods. How about memory consumption? In my opinion, small memory consumption is more important than inference speed, since it enables finetuning of large models with limited GPUs. - I understand that LIM processes data recurrently in fixed-size chunks. Since the number of image tokens is predetermined, that
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
