Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

TL;DR
This paper investigates the ability of language models to switch between in-context learning and in-weights learning, introducing methods to control and balance these strategies within a single model for improved adaptability.
Contribution
It introduces a dual process learning framework that enables models to flexibly deploy both in-context and in-weights learning strategies through novel pretraining and finetuning methods.
Findings
Structural in-context learning appears early in training but diminishes quickly.
Proposed methods can modulate the preference for in-context versus in-weights learning.
A dual process strategy allows coexistence of both learning strategies within a single model.
Abstract
Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning (IWL), where memorized information is encoded in model parameters after iterated observations of data. An ideal model should be able to flexibly deploy both of these abilities. Despite their apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely seen tokens (Land & Bartolo, 2024). Hence, we study , which we define as the ability of a model to execute in-context learning on arbitrary novel tokens -- so called because the model must generalize on the basis of e.g. sentence structure or task structure, rather than content encoded in token embeddings. We study structural in-context algorithms on both synthetic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComplex Systems and Decision Making
