Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge

Bhavya Vasudeva; Puneesh Deora; Alberto Bietti; Vatsal Sharan; Christos Thrampoulidis

arXiv:2603.20969·cs.LG·March 24, 2026

Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge

Bhavya Vasudeva, Puneesh Deora, Alberto Bietti, Vatsal Sharan, Christos Thrampoulidis

PDF

Open Access

TL;DR

This paper investigates how finetuning enables transformers to perform in-context reasoning and contextual recall, revealing that finetuning induces low-dimensional representations necessary for this ability, which pretraining alone does not produce.

Contribution

It introduces a synthetic framework to analyze contextual recall, demonstrating that finetuning on related tasks induces the necessary representations and mechanisms for in-context reasoning.

Findings

01

Pretraining yields factual knowledge but not contextual recall.

02

Finetuning on specific tasks triggers emergence of contextual recall.

03

Low-dimensional latent encodings of attribute types form after finetuning.

Abstract

Transformer-based language models excel at in-context learning (ICL), where they can adapt to new tasks based on contextual examples, without parameter updates. In a specific form of ICL, which we refer to as \textit{contextual recall}, models pretrained on open-ended text leverage pairwise examples to recall specific facts in novel prompt formats. We investigate whether contextual recall emerges from pretraining alone, what finetuning is required, and what mechanisms drive the necessary representations. For this, we introduce a controlled synthetic framework where pretraining sequences consist of subject-grammar-attribute tuples, with attribute types tied to grammar statistics. We demonstrate that while such pretraining successfully yields factual knowledge, it is insufficient for contextual recall: models fail to implicitly infer attribute types when the grammar statistics are removed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning