Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

Hao Yan; Haolin Yang; Yiqiao Zhong

arXiv:2605.03780·cs.LG·May 6, 2026

Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers

Hao Yan, Haolin Yang, Yiqiao Zhong

PDF

TL;DR

This paper explores how task-vector geometry in transformers influences their ability to recognize seen tasks and generalize to new ones, linking internal representations to external behavior.

Contribution

It provides a mathematical analysis of task-vector geometry in small transformers trained on synthetic data, explaining in-distribution and out-of-distribution inference modes.

Findings

01

In-distribution inference uses convex combinations of task vectors.

02

Out-of-distribution inference involves extrapolative learning in orthogonal subspaces.

03

Task-vector geometry is closely related to training distribution and generalization.

Abstract

Transformers are effective at inferring the latent task from context via two inference modes: recognizing a task seen during training, and adapting to a novel one. Recent interpretability studies have identified from middle-layer representations task-specific directions, or task vectors, that steer model behavior. However, a lack of rigorous foundations hinders connecting internal representations to external model behavior: existing work fails to explain how task-vector geometry is shaped by the training distribution, and what geometry enables out-of-distribution (OOD) generalization. In this paper, we study these questions in a controlled synthetic setting by training small transformers from scratch on latent-task sequence distributions, which allows a principled mathematical characterization. We show that two inference modes can coexist within a single model. In-distribution behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.