Can Gradient Descent Simulate Prompting?
Eric Zhang, Leshem Choshen, Jacob Andreas

TL;DR
This paper proposes a meta-training method for language models that enables gradient updates to mimic prompt-based conditioning, enhancing model flexibility and generalization without additional ground-truth labels.
Contribution
It introduces a novel meta-training approach that aligns gradient updates with prompting effects, bridging the gap between fine-tuning and prompting in language models.
Findings
Gradient descent can emulate prompting effects after meta-training.
Models show improved performance on reversal tasks and question answering.
The approach offers insights into model generalization and long-context modeling.
Abstract
There are two primary ways of incorporating new information into a language model (LM): changing its prompt or changing its parameters, e.g. via fine-tuning. Parameter updates incur no long-term storage cost for model changes. However, for many model updates, prompting is significantly more effective: prompted models can generalize robustly from single examples and draw logical inferences that do not occur under standard fine-tuning. Can models be modified so that fine-tuning does emulate prompting? This paper describes a method for meta-training LMs such that gradient updates emulate the effects of conditioning on new information. Our approach uses tools from gradient-based meta-learning but uses an LM's own prompted predictions as targets, eliminating the need for ground-truth labels. Subsequent gradient descent training recovers some (and occasionally all) of prompted model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
