A distance based test on random trees
Ana Georgina Flesia, Ricardo Fraiman

TL;DR
This paper introduces a simple statistical test based on distances between empirical mean trees to compare populations of trees, demonstrating its effectiveness in distinguishing different means through simulations and real genomics data.
Contribution
It proposes a novel distance-based test for comparing tree populations, extending the analogy of the two-sample z test to tree-structured data.
Findings
Test effectively separates distributions with different means
Performance validated through simulations on Galton-Watson processes
Applied successfully to genomics data
Abstract
In this paper, we address the question of comparison between populations of trees. We study an statistical test based on the distance between empirical mean trees, as an analog of the two sample z statistic for comparing two means. Despite its simplicity, we can report that the test is quite powerful to separate distributions with different means but it does not distinguish between different populations with the same mean, a more complicated test should be applied in that setting. The performance of the test is studied via simulations on Galton-Watson branching processes. We also show an application to a real data problem in genomics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Algorithms and Data Compression · Stochastic processes and statistical mechanics
