From Letters to Words and Back: Invertible Coding of Stationary Measures
{\L}ukasz D\k{e}bowski

TL;DR
This paper introduces an invertible measure-preserving mapping called normalized transport between probability measures on infinite sequences over different alphabets, with applications in statistical language modeling and recurrence times.
Contribution
It develops the normalized transport method using self-avoiding codes, connecting measure transport with ergodic properties and entropy rates, advancing the understanding of stationary measures.
Findings
Normalized transport preserves stationarity and ergodicity.
Successive recurrence times are ergodic for ergodic measures.
Relates entropy rates of linked processes.
Abstract
Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters and words. We introduce an invertible mapping between such measures, called the normalized transport, that preserves both stationarity and ergodicity. The normalized transport applies so called self-avoiding codes that generalize comma-separated codes and specialize bijective stationary codes. The normalized transport is also connected to the usual measure transport via underlying asymptotically mean stationary measures. It preserves the ergodic decomposition. The normalized transport and self-avoiding codes arise in the problem of successive recurrence times. We show that successive recurrence times are ergodic for an ergodic measure, which strengthens a result by Chen Moy from 1959. We also relate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications
