Arithmetic with Language Models: from Memorization to Computation
Davide Maltoni, Matteo Ferrara

TL;DR
This paper explores how large language models can perform arithmetic operations like addition and multiplication, revealing their internal mechanisms and extrapolation abilities beyond training data.
Contribution
It demonstrates that language models can learn arithmetic tasks and provides insights into their internal encoding and computation processes.
Findings
Models can generalize arithmetic beyond training data
Internal representations support encoding-regression-decoding process
Extrapolation capabilities depend on internal value space processing
Abstract
A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation…
Peer Reviews
Decision·Submitted to ICLR 2024
* Exploring the limitations of Large Language Models (LLMs) and the transferability of findings, especially in arithmetic tasks, is both relevant and crucial for the community at this time.
* The experimental setup is quite basic, and it's unclear how these findings apply to current Large Language Models. The research primarily focuses on binary addition and multiplication using a simplistic model, which might not be representative of more complex, real-world scenarios. * The paper could benefit from clearer writing. Specifically, the abstract and introduction lack clarity regarding the nature of the investigation. It's not immediately apparent what the central findings are, the e
This paper tackles an important and interesting question. The simplified setting which is analysed here allows the authors to isolate training/optimisation issues (to some extent) and analyse the strategy used by the models to perform arithmetic tasks. The experiments in the paper are clearly written, and give important insights about how language models implement arithmetic tasks.
The main weakness of this paper, in my opinion, is that it does not engage with the model interpretability literature (neither in “mechanistic interpretability” or “probing”). * It cites a single probing paper on probing numeracy in embeddings, which is a highly relevant topic here, but many more exist (e.g., Naik et al. 2019, Sundararaman et al. 2020). * It cites no work on probing, many of which have discussed techniques similar to the presented here. E.g., manipulating datasets on which a mo
- Clean experiments with each section supporting the claim they want to make. - Explains how arithmetic operations works inside a language models and showing they act as encoding-regression-decoding machines.
- Explores it for very simple LM not sure if the results generalize to large LLMs. Showing results on different open-source LLMs might be helpful to make the claim stronger. - Lack of novel insights after reading the paper I am like okay they act as encoding-regression-decoding machines but I don't know how to use this information to build models better at doing arithmetic computation.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Model Reduction and Neural Networks · Machine Learning and Algorithms
