What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
Nadav Borenstein, Anej Svete, Robin Chan, Josef Valvoda, Franz Nowak,, Isabelle Augenstein, Eleanor Chodroff, Ryan Cotterell

TL;DR
This paper empirically investigates how well neural language models, specifically RNNs and Transformers, learn probabilistic regular languages by analyzing factors like RLM rank and string length.
Contribution
It introduces an empirical framework for assessing the learnability of probabilistic regular languages by neural models, focusing on measurable complexity parameters.
Findings
RLM rank strongly predicts learnability for RNNs and Transformers.
Expected string length significantly influences model learnability.
Different predictors show varying importance between RNNs and Transformers.
Abstract
What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of distributions over strings. While prior work in this direction focused on assessing the theoretical limits, in contrast, we seek to understand the empirical learnability. Unlike prior empirical work, we evaluate neural LMs on their home turf-learning probabilistic languages-rather than as classifiers of formal languages. In particular, we investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs. We empirically test the learnability of RLMs as a function of various complexity parameters of the RLM and the hidden state size of the neural LM. We find that the RLM rank, which corresponds to the size of linear space spanned by the logits of its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Natural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention
