Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
Xiaolei Wang, Xinyu Tang, Wayne Xin Zhao, Ji-Rong Wen

TL;DR
This paper investigates the pre-training dynamics of in-context learning, revealing a competitive relationship between task recognition and task learning abilities, and proposes an adaptive ensemble method to enhance ICL performance.
Contribution
It is the first to analyze the competitive dynamics between TR and TL during pre-training and introduces an adaptive ensemble approach to improve ICL performance.
Findings
TR and TL are competitive during pre-training.
Negative correlation between competition and ICL performance.
Adaptive ensemble learning boosts ICL, outperforming larger models.
Abstract
The emergence of in-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) for recognizing the task from demonstrations and utilizing pre-trained priors, and task learning (TL) for learning from demonstrations. However, relationships between the two abilities and how such relationships affect the emergence of ICL is unclear. In this paper, we take the first step by examining the pre-training dynamics of the emergence of ICL. With carefully designed metrics, we find that these two abilities are, in fact, competitive during pre-training. Moreover, we observe a strong negative correlation between the competition and ICL performance. Further analysis of common pre-training factors (i.e., model size, dataset size, and data curriculum) demonstrates possible ways to manage the competition. Based on these insights, we propose a simple yet effective method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Reinforcement Learning in Robotics
