Language Generation: Complexity Barriers and Implications for Learning
Marcelo Arenas, Pablo Barcel\'o, Luis Cofr\'e, Alexander Kozachinskiy

TL;DR
This paper investigates the practical limitations of language generation by analyzing the sample complexity for various formal language classes, revealing infeasibility in many cases despite theoretical possibility.
Contribution
It provides the first detailed analysis of sample complexity barriers in language generation across multiple formal language classes, highlighting gaps between theory and practice.
Findings
Infeasibility arises for context-free and regular languages.
Infeasibility persists for subclasses like locally threshold testable languages.
Infeasibility also occurs in non-erasing pattern languages.
Abstract
Kleinberg and Mullainathan showed that language generation in the limit is always possible at the level of computability: given enough positive examples, a learner can eventually generate data indistinguishable from a target language. However, such existence results do not address feasibility. We study the sample complexity of language generation in the limit for several canonical classes of formal languages. Our results show that infeasibility already appears for context-free and regular languages, and persists even for strict subclasses such as locally threshold testable languages, as well as for incomparable classes such as non-erasing pattern languages, a well-studied class in the theory of language identification. Overall, our results establish a clear gap between the theoretical possibility of language generation in the limit and its computational feasibility.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Text Readability and Simplification · Natural Language Processing Techniques
