Dual Process Learning: Controlling Use of In-Context vs. In-Weights   Strategies with Weight Forgetting

Suraj Anand; Michael A. Lepori; Jack Merullo; Ellie Pavlick

arXiv:2406.00053·cs.CL·March 4, 2025

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

PDF

Open Access 1 Video

TL;DR

This paper investigates the ability of language models to switch between in-context learning and in-weights learning, introducing methods to control and balance these strategies within a single model for improved adaptability.

Contribution

It introduces a dual process learning framework that enables models to flexibly deploy both in-context and in-weights learning strategies through novel pretraining and finetuning methods.

Findings

01

Structural in-context learning appears early in training but diminishes quickly.

02

Proposed methods can modulate the preference for in-context versus in-weights learning.

03

A dual process strategy allows coexistence of both learning strategies within a single model.

Abstract

Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning (IWL), where memorized information is encoded in model parameters after iterated observations of data. An ideal model should be able to flexibly deploy both of these abilities. Despite their apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely seen tokens (Land & Bartolo, 2024). Hence, we study $structural in-context learning$ , which we define as the ability of a model to execute in-context learning on arbitrary novel tokens -- so called because the model must generalize on the basis of e.g. sentence structure or task structure, rather than content encoded in token embeddings. We study structural in-context algorithms on both synthetic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting· slideslive

Taxonomy

TopicsComplex Systems and Decision Making