Dissecting Multiplication in Transformers: Insights into LLMs

Luyu Qiu; Jianing Li; Chi Su; Chen Jason Zhang; Lei Chen

arXiv:2407.15360·cs.CL·July 23, 2024

Dissecting Multiplication in Transformers: Insights into LLMs

Luyu Qiu, Jianing Li, Chi Su, Chen Jason Zhang, Lei Chen

PDF

Open Access 1 Repo

TL;DR

This paper analyzes how transformers perform integer multiplication, identifies their limitations in handling carryovers, and proposes improvements that significantly boost accuracy, enhancing interpretability and trust in large language models.

Contribution

The paper provides a detailed analysis of transformers' shortcomings in multiplication and introduces targeted enhancements that improve performance and interpretability.

Findings

01

Transformers decompose multiplication into parallel subtasks.

02

Difficulty in calculating carryovers limits performance.

03

Proposed improvements achieve over 99.9% accuracy on 5-digit multiplication.

Abstract

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This stark disparity raise human's concerns about their safe and ethical use, hinder their widespread adoption.In this paper, we focus on a typical arithmetic task, integer multiplication, to explore and explain the imperfection of transformers in this domain. We provide comprehensive analysis of a vanilla transformer trained to perform n-digit integer multiplication. Our observations indicate that the model decomposes multiplication task into multiple parallel subtasks, sequentially optimizing each subtask for each digit to complete the final multiplication. Based on observation and analysis, we infer the reasons of transformers deficiencies in multiplication…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Chloe817/MultiplicationInTransformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Advancements in Semiconductor Devices and Circuit Design

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Focus · Label Smoothing · Linear Layer · GPT-4 · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings