Identification of Probabilities of Languages
Paul M. B. Vitanyi (CWI, University of Amsterdam), Nick Chater, (Behavioural Science Group, Warwick Business School, University of Warwick)

TL;DR
This paper investigates methods for inferring language probability distributions from infinite data sequences under various assumptions, providing effective procedures for identification and convergence in different computability and dependence scenarios.
Contribution
It introduces new algorithms for identifying language probability distributions, handling both computable and incomputable cases, with convergence guarantees under realistic assumptions.
Findings
Effective procedures for almost sure identification of computable distributions
Pointwise convergence to target distributions in incomputable cases
Identification of typical measures for dependent data in finite languages
Abstract
We consider the problem of inferring the probability distribution associated with a language, given data consisting of an infinite sequence of elements of the languge. We do this under two assumptions on the algorithms concerned: (i) like a real-life algorothm it has round-off errors, and (ii) it has no round-off errors. Assuming (i) we (a) consider a probability mass function of the elements of the language if the data are drawn independent identically distributed (i.i.d.), provided the probability mass function is computable and has a finite expectation. We give an effective procedure to almost surely identify in the limit the target probability mass function using the Strong Law of Large Numbers. Second (b) we treat the case of possibly incomputable probabilistic mass functions in the above setting. In this case we can only pointswize converge to the target probability mass function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Fractal and DNA sequence analysis · semigroups and automata theory
