Can Gradient Descent Simulate Prompting?

Eric Zhang; Leshem Choshen; Jacob Andreas

arXiv:2506.20989·cs.CL·June 27, 2025

Can Gradient Descent Simulate Prompting?

Eric Zhang, Leshem Choshen, Jacob Andreas

PDF

Open Access

TL;DR

This paper proposes a meta-training method for language models that enables gradient updates to mimic prompt-based conditioning, enhancing model flexibility and generalization without additional ground-truth labels.

Contribution

It introduces a novel meta-training approach that aligns gradient updates with prompting effects, bridging the gap between fine-tuning and prompting in language models.

Findings

01

Gradient descent can emulate prompting effects after meta-training.

02

Models show improved performance on reversal tasks and question answering.

03

The approach offers insights into model generalization and long-context modeling.

Abstract

There are two primary ways of incorporating new information into a language model (LM): changing its prompt or changing its parameters, e.g. via fine-tuning. Parameter updates incur no long-term storage cost for model changes. However, for many model updates, prompting is significantly more effective: prompted models can generalize robustly from single examples and draw logical inferences that do not occur under standard fine-tuning. Can models be modified so that fine-tuning does emulate prompting? This paper describes a method for meta-training LMs such that gradient updates emulate the effects of conditioning on new information. Our approach uses tools from gradient-based meta-learning but uses an LM's own prompted predictions as targets, eliminating the need for ground-truth labels. Subsequent gradient descent training recovers some (and occasionally all) of prompted model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks