On the variety of shapes in digital trees
Jeffrey Gaither, Hosam Mahmoud, Mark Daniel Ward

TL;DR
This paper investigates the distribution of motif occurrences in digital data, demonstrating that under certain conditions, these counts follow a multivariate normal distribution, with implications for understanding shape variability in digital trees.
Contribution
It provides a measure-theoretic framework for analyzing motif distributions in digital data and establishes normality results for their linear combinations.
Findings
Linear combinations of motif counts are asymptotically normal.
Motif counts follow a multivariate normal distribution under certain conditions.
The methods include combinatorics, integral transforms, and poissonization.
Abstract
We study the joint distribution of the number of occurrences of members of a collection of nonoverlapping motifs in digital data. We deal with finite and countably infinite collections. For infinite collections, the setting requires that we be very explicit about the specification of the underlying measure-theoretic formulation. We show that (under appropriate normalization) for such a collection, any linear combination of the number of occurrences of each of the motifs in the data has a limiting normal distribution. In many instances, this can be interpreted in terms of the number of occurrences of individual motifs: They have a multivariate normal distribution. The methods of proof include combinatorics on words, integral transforms, and poissonization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
