Statistical Properties of the Rooted-Tree Encoding of $\mathbb{N}$
Pierluigi Contucci, Claudio Giberti, Godwin Osabutey, Cecilia Vernia

TL;DR
This paper analyzes the statistical properties of a novel rooted-tree encoding of natural numbers, revealing complex patterns, correlations, and deviations from Zipf's law, with implications for understanding structured sequences.
Contribution
It introduces a recursive tree-based encoding of natural numbers and provides the first detailed statistical analysis of its properties and correlations.
Findings
Dictionary and entropy grow sublinearly
Compression exhibits non-monotonic trends
Rank-frequency curves are parabolic, deviating from Zipf's law
Abstract
We prime-encode the natural numbers via recursive factorisation, iterated to the exponents, generating a corpus of planar rooted trees equivalently represented as Dyck words. This forms a deterministic text endowed with internal rules. Statistical analysis of the corpus reveals that the dictionary and the entropy grow sublinearly, compression shows non-monotonic trend, and the rank-frequency curves assume a stable parabolic form deviating from Zipf's law. Correlation analysis using mean-squared displacement reveals a transition from normal diffusion to superdiffusion in the associated walk. These findings characterise the tree-encoded sequence as a statistically structured text with long-range correlations grounded in its generative arithmetic law, providing an empirical basis for subsequent theoretical and learnability
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Fractal and DNA sequence analysis · Algorithms and Data Compression
