Is Mamba Capable of In-Context Learning?

Riccardo Grazzi; Julien Siems; Simon Schrodi; Thomas Brox; Frank; Hutter

arXiv:2402.03170·cs.LG·April 25, 2024·5 cites

Is Mamba Capable of In-Context Learning?

Riccardo Grazzi, Julien Siems, Simon Schrodi, Thomas Brox, Frank, Hutter

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that Mamba, a state space model, exhibits in-context learning capabilities comparable to transformers, especially for long input sequences, offering an efficient alternative for such tasks.

Contribution

The work provides empirical evidence that Mamba can perform in-context learning similarly to transformers, extending ICL capabilities to a more scalable model.

Findings

01

Mamba matches transformer performance in ICL tasks.

02

Mamba effectively handles long input sequences.

03

ICL in Mamba involves incremental internal optimization.

Abstract

State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL), a variant of meta-learning concerning the learned ability to solve tasks during a neural network forward pass, exploiting contextual information provided as input to the model. This useful ability emerges as a side product of the foundation model's massive pretraining. While transformer models are currently the state of the art in ICL, this work provides empirical evidence that Mamba, a newly proposed state space model which scales better than transformers w.r.t. the input sequence length, has similar ICL capabilities. We evaluated Mamba on tasks involving simple function approximation as well as more complex natural language processing problems. Our results demonstrate that, across both categories of tasks, Mamba closely matches the performance of transformer models for ICL. Further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

automl/is_mamba_capable_of_icl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducation and Technology Integration

MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing