Investigating the Limitations of Transformers with Simple Arithmetic   Tasks

Rodrigo Nogueira; Zhiying Jiang; Jimmy Lin

arXiv:2102.13019·cs.CL·April 14, 2021·31 cites

Investigating the Limitations of Transformers with Simple Arithmetic Tasks

Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin

PDF

Open Access 1 Repo

TL;DR

This paper examines how the surface form of numbers affects the ability of transformer-based language models to learn simple arithmetic tasks, revealing that representation choices significantly impact accuracy and generalization.

Contribution

It demonstrates that proper surface representations enable models to learn and generalize arithmetic operations, highlighting limitations of current tokenization and positional encoding methods.

Findings

01

Number representation strongly influences model accuracy

02

Position tokens improve learning of long-number arithmetic

03

Models fail to generalize addition rules across different number lengths

Abstract

The ability to perform arithmetic tasks is a remarkable trait of human intelligence and might form a critical component of more complex reasoning tasks. In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values. We find that how a number is represented in its surface form has a strong influence on the model's accuracy. In particular, the model fails to learn addition of five-digit numbers when using subwords (e.g., "32"), and it struggles to learn with character-level representations (e.g., "3 2"). By introducing position tokens (e.g., "3 10e1 2"), the model learns to accurately add and subtract numbers up to 60 digits. We conclude that modern pretrained language models can easily learn arithmetic from very few examples, as long as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

castorini/transformers-arithmetic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Artificial Intelligence in Games · Topic Modeling