Can Transformers Learn $n$-gram Language Models?

Anej Svete; Nadav Borenstein; Mike Zhou; Isabelle Augenstein; Ryan; Cotterell

arXiv:2410.03001·cs.CL·October 7, 2024

Can Transformers Learn $n$-gram Language Models?

Anej Svete, Nadav Borenstein, Mike Zhou, Isabelle Augenstein, Ryan, Cotterell

PDF

Open Access

TL;DR

This paper investigates whether transformer models can learn n-gram language models, comparing their performance to traditional methods on different types of n-gram tasks, revealing conditions where transformers excel or underperform.

Contribution

The study empirically evaluates transformers' ability to learn random n-gram language models, contrasting their performance with classical methods and identifying scenarios where transformers outperform specialized n-gram learners.

Findings

01

Transformers outperform classical methods on n-gram models with shared parameters.

02

Classical estimation techniques outperform transformers on arbitrary probability n-gram models.

03

Transformers excel in learning n-gram models with shared parameters, surpassing traditional methods.

Abstract

Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the learning algorithm, and training data. To test whether theoretical lower bounds imply \emph{learnability} of formal languages, we turn to recent work relating transformers to $n$ -gram language models (LMs). We study transformers' ability to learn random $n$ -gram LMs of two kinds: ones with arbitrary next-symbol probabilities and ones where those are defined with shared parameters. We find that classic estimation techniques for $n$ -gram LMs such as add- $λ$ smoothing outperform transformers on the former, while transformers perform better on the latter, outperforming methods specifically designed to learn $n$ -gram LMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling