Can Transformers Learn $n$-gram Language Models?
Anej Svete, Nadav Borenstein, Mike Zhou, Isabelle Augenstein, Ryan, Cotterell

TL;DR
This paper investigates whether transformer models can learn n-gram language models, comparing their performance to traditional methods on different types of n-gram tasks, revealing conditions where transformers excel or underperform.
Contribution
The study empirically evaluates transformers' ability to learn random n-gram language models, contrasting their performance with classical methods and identifying scenarios where transformers outperform specialized n-gram learners.
Findings
Transformers outperform classical methods on n-gram models with shared parameters.
Classical estimation techniques outperform transformers on arbitrary probability n-gram models.
Transformers excel in learning n-gram models with shared parameters, surpassing traditional methods.
Abstract
Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the learning algorithm, and training data. To test whether theoretical lower bounds imply \emph{learnability} of formal languages, we turn to recent work relating transformers to -gram language models (LMs). We study transformers' ability to learn random -gram LMs of two kinds: ones with arbitrary next-symbol probabilities and ones where those are defined with shared parameters. We find that classic estimation techniques for -gram LMs such as add- smoothing outperform transformers on the former, while transformers perform better on the latter, outperforming methods specifically designed to learn -gram LMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
