Pre-trained Large Language Models Use Fourier Features to Compute   Addition

Tianyi Zhou; Deqing Fu; Vatsal Sharan; Robin Jia

arXiv:2406.03445·cs.LG·June 6, 2024

Pre-trained Large Language Models Use Fourier Features to Compute Addition

Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia

PDF

Open Access 1 Video

TL;DR

This paper reveals that pre-trained large language models use Fourier features in their hidden states to perform addition, with low-frequency features estimating magnitude and high-frequency features handling modular aspects, highlighting the importance of pre-training.

Contribution

It uncovers the Fourier feature mechanism in LLMs for addition and demonstrates how pre-training enables models to utilize these features for precise arithmetic reasoning.

Findings

01

Pre-trained LLMs use Fourier features to perform addition.

02

Low-frequency features estimate answer magnitude, high-frequency features handle modular aspects.

03

Pre-training is essential for models to exploit Fourier features effectively.

Abstract

Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain. Within the model, MLP and attention layers use Fourier features in complementary ways: MLP layers primarily approximate the magnitude of the answer using low-frequency features, while attention layers primarily perform modular addition (e.g., computing whether the answer is even or odd) using high-frequency features. Pre-training is crucial for this mechanism: models trained from scratch to add numbers only exploit low-frequency features, leading to lower accuracy. Introducing pre-trained token embeddings to a randomly initialized model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Pre-trained Large Language Models Use Fourier Features to Compute Addition· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training