Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks

Xingcheng Xu; Zibo Zhao; Haipeng Zhang; Yanqing Yang

arXiv:2407.17963·cs.LG·August 7, 2025

Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks

Xingcheng Xu, Zibo Zhao, Haipeng Zhang, Yanqing Yang

PDF

Open Access 1 Video

TL;DR

This paper presents a theoretical framework to understand how transformer models generalize in arithmetic reasoning tasks, emphasizing the role of task structure and positional encoding in length generalization.

Contribution

It introduces a unified theory linking positional encoding and task structure to transformer generalization, validated through experiments on GPT models.

Findings

01

Translation invariance in addition aids generalization

02

Base mismatch in modular operations causes generalization failure

03

Framework accurately predicts transformer behavior in arithmetic tasks

Abstract

Transformer-based models excel in various tasks but their generalization capabilities, especially in arithmetic reasoning, remain incompletely understood. Arithmetic tasks provide a controlled framework to explore these capabilities, yet performance anomalies persist, such as inconsistent effectiveness in multiplication and erratic generalization in modular addition (e.g., modulo 100 vs. 101). This paper develops a unified theoretical framework for understanding the generalization behaviors of transformers in arithmetic tasks, focusing on length generalization. Through detailed analysis of addition, multiplication, and modular operations, we reveal that translation invariance in addition aligns with relative positional encoding for robust generalization, while base mismatch in modular operations disrupts this alignment. Experiments across GPT-family models validate our framework,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Principled Understanding of Generalization for Generative Transformer Models in Arithmetic Reasoning Tasks· underline

Taxonomy

TopicsMathematics Education and Teaching Techniques

MethodsSoftmax · Attention Is All You Need