Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks?

Amirhesam Abedsoltan; Huaqing Zhang; Kaiyue Wen; Hongzhou Lin; Jingzhao Zhang; Mikhail Belkin

arXiv:2502.08991·cs.LG·June 10, 2025

Task Generalization With AutoRegressive Compositional Structure: Can Learning From $D$ Tasks Generalize to $D^{T}$ Tasks?

Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen, Hongzhou Lin, Jingzhao Zhang, Mikhail Belkin

PDF

Open Access 1 Video

TL;DR

This paper explores how large language models can generalize to exponentially many tasks by learning from a small set, using autoregressive compositional structures, with theoretical and empirical evidence across various tasks.

Contribution

It introduces a theoretical framework showing exponential task generalization from limited training and demonstrates this with empirical results on parity, arithmetic, and translation tasks.

Findings

01

Theoretically, training on O(D) tasks can generalize to D^T tasks.

02

Transformers achieve exponential generalization on sparse parity functions.

03

Empirical success extends to arithmetic and translation tasks.

Abstract

Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of autoregressive compositional structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $D$ subtasks. This yields a total class of size $D^{T}$ . We first show that generalization to all $D^{T}$ tasks is theoretically achievable by training on only $O (D)$ tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via In-context Learning (ICL) and chain-of-thought (CoT) reasoning. We further show generalization in arithmetic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training