Hardness of Learning Regular Languages in the Next Symbol Prediction Setting
Satwik Bhattamishra, Phil Blunsom, Varun Kanade

TL;DR
This paper investigates the difficulty of learning regular languages in a specialized next symbol prediction setting, showing that even with richer labels, the problem remains computationally hard under certain assumptions.
Contribution
It formalizes the NSP learning setting and proves that learning regular languages remains computationally hard within this framework, extending classical hardness results.
Findings
Learning regular languages in NSP setting is computationally hard.
The hardness holds even with richer label information.
Cryptographic assumptions imply no efficient algorithms exist for this task.
Abstract
We study the learnability of languages in the Next Symbol Prediction (NSP) setting, where a learner receives only positive examples from a language together with, for every prefix, (i) whether the prefix itself is in the language and (ii) which next symbols can lead to an accepting string. This setting has been used in prior works to empirically analyze neural sequence models, and additionally, we observe that efficient algorithms for the NSP setting can be used to learn the (truncated) support of language models. We formalize the setting so as to make it amenable to PAC-learning analysis. While the setting provides a much richer set of labels than the conventional classification setting, we show that learning concept classes such as DFAs and Boolean formulas remains computationally hard. The proof is via a construction that makes almost all additional labels uninformative, yielding a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Natural Language Processing Techniques
