State Soup: In-Context Skill Learning, Retrieval and Mixing

Maciej Pi\'oro; Maciej Wo{\l}czyk; Razvan Pascanu; Johannes von; Oswald; Jo\~ao Sacramento

arXiv:2406.08423·cs.LG·June 13, 2024

State Soup: In-Context Skill Learning, Retrieval and Mixing

Maciej Pi\'oro, Maciej Wo{\l}czyk, Razvan Pascanu, Johannes von, Oswald, Jo\~ao Sacramento

PDF

Open Access

TL;DR

This paper explores the use of linear state interpolation in gated-linear recurrent neural networks to enhance in-context learning and task performance by treating internal states as retrievable, combinable task vectors.

Contribution

It introduces a novel approach of treating internal states as task vectors for linear combination, improving sequence modeling and in-context learning in recurrent neural networks.

Findings

01

Linear state interpolation improves perplexity.

02

State merging enhances in-context learning performance.

03

Preliminary evidence supports the effectiveness of the method.

Abstract

A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter interpolation. Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined, exploiting the linearity of recurrence. We study this form of fast model merging on Mamba-2.8b, a pretrained recurrent model, and present preliminary evidence that simple linear state interpolation methods suffice to improve next-token perplexity as well as downstream in-context learning task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Machine Learning and Algorithms