A model of errors in transformers
Suvrat Raju, Praneeth Netrapalli

TL;DR
This paper presents a simplified two-parameter model inspired by effective field theory to analyze and predict error rates in large language models on deterministic tasks, supported by extensive empirical validation.
Contribution
It introduces a novel two-parameter quantitative relationship for LLM errors, offering insights into error accumulation and methods to reduce errors through prompt design.
Findings
Excellent agreement between model predictions and empirical data.
Identification of deviations indicating limits of the model.
Demonstration of prompt construction to lower error rates.
Abstract
We study the error rate of LLMs on tasks like arithmetic that require a deterministic output, and repetitive processing of tokens drawn from a small set of alternatives. We argue that incorrect predictions arise when small errors in the attention mechanism accumulate to cross a threshold, and use this insight to derive a quantitative two-parameter relationship between the accuracy and the complexity of the task. The two parameters vary with the prompt and the model; they can be interpreted in terms of an elementary noise rate, and the number of plausible erroneous tokens that can be predicted. Our analysis is inspired by an ``effective field theory'' perspective: the LLM's many raw parameters can be reorganized into just two parameters that govern the error rate. We perform extensive empirical tests, using Gemini 2.5 Flash, Gemini 2.5 Pro and DeepSeek R1, and find excellent agreement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive and developmental aspects of mathematical skills · Neural and Behavioral Psychology Studies · Reading and Literacy Development
