Generation of hierarchically correlated multivariate symbolic sequences
Mi. Tumminello, F. Lillo, R. N. Mantegna

TL;DR
This paper presents an algorithm to generate multivariate symbolic sequences with a specified hierarchical similarity structure, useful for modeling complex relationships in data such as phylogenies.
Contribution
It introduces a novel algorithm for creating hierarchical multivariate symbolic sequences, extending the hierarchically nested factor model to finite alphabets.
Findings
The algorithm effectively reproduces arbitrary hierarchical similarity structures.
Application to phylogenetic analysis shows correlation between bootstrap values and true phylogeny.
Provides a new tool for simulating complex hierarchical symbolic data.
Abstract
We introduce an algorithm to generate multivariate series of symbols from a finite alphabet with a given hierarchical structure of similarities. The target hierarchical structure of similarities is arbitrary, for instance the one obtained by some hierarchical clustering procedure as applied to an empirical matrix of Hamming distances. The algorithm can be interpreted as the finite alphabet equivalent of the recently introduced hierarchically nested factor model (M. Tumminello et al. EPL 78 (3) 30006 (2007)). The algorithm is based on a generating mechanism that is different from the one used in the mutation rate approach. We apply the proposed methodology for investigating the relationship between the bootstrap value associated with a node of a phylogeny and the probability of finding that node in the true phylogeny.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
