Provable In-Context Vector Arithmetic via Retrieving Task Concepts
Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Qingfu Zhang, Hau-San Wong, Taiji Suzuki

TL;DR
This paper presents a theoretical framework explaining how in-context learning in large language models enables vector arithmetic for factual recall, demonstrating robustness and generalization through empirical and theoretical analysis.
Contribution
It introduces a new theoretical model for in-context learning, showing how transformers perform vector arithmetic for factual recall and generalize across shifts.
Findings
Proves convergence of 0-1 loss in the model
Shows robustness to concept recombination and distribution shifts
Empirical simulations support theoretical results
Abstract
In-context learning (ICL) has garnered significant attention for its ability to grasp functions/tasks from demonstrations. Recent studies suggest the presence of a latent task/function vector in LLMs during ICL. Merullo et al. (2024) showed that LLMs leverage this vector alongside the residual stream for Word2Vec-like vector arithmetic, solving factual-recall ICL tasks. Additionally, recent work empirically highlighted the key role of Question-Answer data in enhancing factual-recall capabilities. Despite these insights, a theoretical explanation remains elusive. To move one step forward, we propose a theoretical framework building on empirically grounded hierarchical concept modeling. We develop an optimization theory, showing how nonlinear residual transformers trained via gradient descent on cross-entropy loss perform factual-recall ICL tasks via vector arithmetic. We prove 0-1 loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
