Banach density of generated languages: Dichotomies in topology and dimension
Jon Kleinberg, Fan Wei

TL;DR
This paper investigates the Banach density of generated languages embedded in multiple dimensions, revealing topological and dimensional dichotomies that influence the density achievable by algorithms.
Contribution
It introduces the use of Banach density in language generation, uncovering new topological and dimensional structures affecting generative models.
Findings
In dimension one, finite Cantor-Bendixson rank allows algorithms to achieve 1/2 Banach density.
Infinite Cantor-Bendixson rank can prevent any positive Banach density from being achieved.
In higher dimensions, Ramsey-theoretic obstacles require nondegeneracy conditions for positive density.
Abstract
The formalism of language generation in the limit studies generative models by requiring an algorithm, given strings from a hidden true language, to eventually generate new valid strings. A core issue is the tension between validity and breadth. Prior work quantified breadth via asymptotic density, where the priority is generating strings early in a natural countable ordering. Here, we study density when the strings are embedded in dimensions, a ubiquitous structure in current generative models. Our goal is for the generated strings to be dense throughout the embedding. This requires a different measure, the Banach density, which captures whether a set contains large sparse regions. Using Banach density uncovers a rich structure based on dimension and the topology of the language collection. We prove that in dimension one, when the underlying topological space has finite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
