Transformers know more than they can tell -- Learning the Collatz sequence

Fran\c{c}ois Charton; Ashvni Narayanan

arXiv:2511.10811·cs.LG·November 17, 2025

Transformers know more than they can tell -- Learning the Collatz sequence

Fran\c{c}ois Charton, Ashvni Narayanan

PDF

Open Access

TL;DR

This paper explores how transformer models predict the complex Collatz sequence, revealing they learn underlying mathematical properties like loop lengths, with implications for understanding model reasoning and control structures.

Contribution

The study demonstrates that transformers learn to predict Collatz sequence steps by capturing residual classes and loop lengths, providing insight into their internal algorithms for complex arithmetic functions.

Findings

01

Models achieve up to 99.7% accuracy for certain bases.

02

Learning patterns correspond to residual classes modulo powers of two.

03

Most errors are predictable and involve estimating loop lengths correctly.

Abstract

We investigate transformer prediction of long Collatz steps, a complex arithmetic function that maps odd integers to their distant successors in the Collatz sequence ( $u_{n + 1} = u_{n} /2$ if $u_{n}$ is even, $u_{n + 1} = (3 u_{n} + 1) /2$ if $u_{n}$ is odd). Model accuracy varies with the base used to encode input and output. It can be as high as $99.7%$ for bases $24$ and $32$ , and as low as $37$ and $25%$ for bases $11$ and $3$ . Yet, all models, no matter the base, follow a common learning pattern. As training proceeds, they learn a sequence of classes of inputs that share the same residual modulo $2^{p}$ . Models achieve near-perfect accuracy on these classes, and less than $1%$ for all other inputs. This maps to a mathematical property of Collatz sequences: the length of the loops involved in the computation of a long Collatz step can be deduced from the binary representation of its input. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBenford’s Law and Fraud Detection · Misinformation and Its Impacts · Computational Physics and Python Applications