Provable In-Context Vector Arithmetic via Retrieving Task Concepts

Dake Bu; Wei Huang; Andi Han; Atsushi Nitanda; Qingfu Zhang; Hau-San Wong; Taiji Suzuki

arXiv:2508.09820·cs.LG·August 14, 2025

Provable In-Context Vector Arithmetic via Retrieving Task Concepts

Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Qingfu Zhang, Hau-San Wong, Taiji Suzuki

PDF

TL;DR

This paper presents a theoretical framework explaining how in-context learning in large language models enables vector arithmetic for factual recall, demonstrating robustness and generalization through empirical and theoretical analysis.

Contribution

It introduces a new theoretical model for in-context learning, showing how transformers perform vector arithmetic for factual recall and generalize across shifts.

Findings

01

Proves convergence of 0-1 loss in the model

02

Shows robustness to concept recombination and distribution shifts

03

Empirical simulations support theoretical results

Abstract

In-context learning (ICL) has garnered significant attention for its ability to grasp functions/tasks from demonstrations. Recent studies suggest the presence of a latent task/function vector in LLMs during ICL. Merullo et al. (2024) showed that LLMs leverage this vector alongside the residual stream for Word2Vec-like vector arithmetic, solving factual-recall ICL tasks. Additionally, recent work empirically highlighted the key role of Question-Answer data in enhancing factual-recall capabilities. Despite these insights, a theoretical explanation remains elusive. To move one step forward, we propose a theoretical framework building on empirically grounded hierarchical concept modeling. We develop an optimization theory, showing how nonlinear residual transformers trained via gradient descent on cross-entropy loss perform factual-recall ICL tasks via vector arithmetic. We prove 0-1 loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.