Agnostic Language Identification and Generation
Mikael M{\o}ller H{\o}gsgaard, Chirag Pabbaraju

TL;DR
This paper introduces a new framework for language identification and generation that removes previous assumptions about data distribution, providing novel characterizations and nearly optimal rates in this more general setting.
Contribution
It relaxes the realizability assumption in language tasks, offering a more general approach with new theoretical insights and tight performance bounds.
Findings
Develops objectives for agnostic language identification and generation.
Provides novel characterizations of the problems.
Achieves nearly tight statistical rates.
Abstract
Recent works on language identification and generation have established tight statistical rates at which these tasks can be achieved. These works typically operate under a strong realizability assumption: that the input data is drawn from an unknown distribution necessarily supported on some language in a given collection. In this work, we relax this assumption of realizability entirely, and impose no restrictions on the distribution of the input data. We propose objectives to study both language identification and generation in this more general "agnostic" setup. Across both problems, we obtain novel interesting characterizations and nearly tight rates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
