Language Generation and Identification From Partial Enumeration: Tight Density Bounds and Topological Characterizations
Jon Kleinberg, Fan Wei

TL;DR
This paper establishes tight bounds on the density achievable in language generation from partial enumeration, extending classical models with topological characterizations and analyzing the limits of learning in this framework.
Contribution
It proves a tight density bound of 1/2 for language generation and extends the model to partial enumeration, providing new topological insights into language identification.
Findings
Achieved a tight density bound of 1/2 for language generation.
Extended the model to partial enumeration with a density factor of 1/2.
Provided a topological characterization of language identification conditions.
Abstract
The success of large language models (LLMs) has motivated formal theories of language generation and learning. We study the framework of \emph{language generation in the limit}, where an adversary enumerates strings from an unknown language drawn from a countable class, and an algorithm must generate unseen strings from . Prior work showed that generation is always possible, and that some algorithms achieve positive lower density, revealing a \emph{validity--breadth} trade-off between correctness and coverage. We resolve a main open question in this line, proving a tight bound of on the best achievable lower density. We then strengthen the model to allow \emph{partial enumeration}, where the adversary reveals only an infinite subset . We show that generation in the limit remains achievable, and if has lower density in , the algorithm's output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · semigroups and automata theory · Natural Language Processing Techniques
