Nonparametric statistical inference for the context tree of a stationary ergodic process
Sandro Gallo, Florencia Leonardi

TL;DR
This paper investigates the limits of nonparametric inference for the context tree of stationary ergodic processes, establishing conditions for consistent estimation and proposing a practical lower bound estimator with applications in linguistics.
Contribution
It introduces a new framework for understanding the feasibility of context tree estimation and provides a consistent, computationally efficient lower bound estimator with explicit coverage probabilities.
Findings
No consistent estimators exist if the Hamming metric is unbounded.
One-sided inference is feasible with a constructed lower bound estimator.
Applied method to analyze the context tree of European Portuguese texts.
Abstract
We consider the problem of estimating the context tree of a stationary ergodic process with finite alphabet without imposing additional conditions on the process. As a starting point we introduce a Hamming metric in the space of irreducible context trees and we use the properties of the weak topology in the space of ergodic stationary processes to prove that if the Hamming metric is unbounded, there exist no consistent estimators for the context tree. Even in the bounded case we show that there exist no two-sided confidence bounds. However we prove that one-sided inference is possible in this general setting and we construct a consistent estimator that is a lower bound for the context tree of the process with an explicit formula for the coverage probability. We develop an efficient algorithm to compute the lower bound and we apply the method to test a linguistic hypothesis about the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
