Quantifying Noise in Language Generation

Aaron Li; Ian Zhang

arXiv:2601.21237·cs.DS·January 30, 2026

Quantifying Noise in Language Generation

Aaron Li, Ian Zhang

PDF

Open Access

TL;DR

This paper investigates the impact of noise on language generation models, showing that even a single extraneous string can significantly restrict generability and establishing equivalence between finite and unlimited noise scenarios.

Contribution

It provides the first characterization of noise-dependent generatability and demonstrates that a single noisy string has a profound effect, contrasting previous hierarchical results.

Findings

01

A single noisy string reduces the set of generatable collections.

02

Generation with one noisy string is equivalent to any finite amount of noise.

03

First characterization of noise-dependent generatability in language generation.

Abstract

Kleinberg and Mullainathan recently proposed a formal framework for studying the phenomenon of language generation, called language generation in the limit. In this model, an adversary gives an enumeration of example strings from an unknown target language, and the algorithm is tasked with correctly generating unseen strings from the target language within finite time. Refined notions of non-uniform and uniform generation were later introduced by Li, Raman, and Tewari (2025), and a noisy model was introduced by Raman and Raman (2025), which allows the adversary to insert extraneous strings. A natural question in the noisy model is to quantify the effect of noise, by studying the impact of each additional extraneous string. We show two complementary results in this setting. We first show that for both uniform and non-uniform generation, a single noisy string strictly reduces the set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · semigroups and automata theory · Natural Language Processing Techniques