Unraveling Arithmetic in Large Language Models: The Role of Algebraic   Structures

Fu-Chieh Chang; You-Chen Lin; Pei-Yuan Wu

arXiv:2411.16260·cs.LG·April 22, 2025

Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures

Fu-Chieh Chang, You-Chen Lin, Pei-Yuan Wu

PDF

Open Access

TL;DR

This paper investigates how large language models perform arithmetic by learning algebraic structures, providing empirical and theoretical evidence that such structures enable better generalization and understanding of arithmetic reasoning.

Contribution

It introduces the idea that LLMs learn algebraic structures like commutativity and identity, supported by empirical datasets and theoretical analysis of transformer embeddings.

Findings

01

LLMs can learn algebraic properties from data.

02

Transformer embeddings can be invariant to permutations and identity elements.

03

Leveraging algebraic structures improves LLMs' arithmetic reasoning.

Abstract

Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting, which decomposes complex reasoning into step-by-step solutions. This approach has enabled significant advancements, as evidenced by performance on benchmarks like GSM8K and MATH. However, the mechanisms underlying LLMs' ability to perform arithmetic in a single step of CoT remain poorly understood. Existing studies debate whether LLMs encode numerical values or rely on symbolic reasoning, while others explore attention and multi-layered processing in arithmetic tasks. In this work, we propose that LLMs learn arithmetic by capturing algebraic structures, such as commutativity and identity properties. Since these structures are observable through input-output relationships, they can generalize to unseen data. We empirically demonstrate that LLMs can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need