Non-parametric change-point detection using string matching algorithms
Oliver Johnson, Dino Sejdinovic, James Cruise, Ayalvadi Ganesh, Robert, Piechocki

TL;DR
This paper introduces CRECHE, a non-parametric change-point detection method based on string matching, which accurately identifies distribution changes in data sources without prior assumptions.
Contribution
The paper proposes CRECHE, a novel non-parametric estimator for change-point detection using string matching, with proven consistency and applicability to real and simulated data.
Findings
CRECHE accurately detects change-points in various data sources.
The estimator performs well on real concatenated text data.
It requires no assumptions about source distribution.
Abstract
Given the output of a data source taking values in a finite alphabet, we wish to detect change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately detect the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, we establish consistency of the CRECHE estimator under a related toy model, by establishing a fluid limit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
