A distance based test on random trees

Ana Georgina Flesia; Ricardo Fraiman

arXiv:0708.1733·math.ST·August 14, 2007

A distance based test on random trees

Ana Georgina Flesia, Ricardo Fraiman

PDF

Open Access

TL;DR

This paper introduces a simple statistical test based on distances between empirical mean trees to compare populations of trees, demonstrating its effectiveness in distinguishing different means through simulations and real genomics data.

Contribution

It proposes a novel distance-based test for comparing tree populations, extending the analogy of the two-sample z test to tree-structured data.

Findings

01

Test effectively separates distributions with different means

02

Performance validated through simulations on Galton-Watson processes

03

Applied successfully to genomics data

Abstract

In this paper, we address the question of comparison between populations of trees. We study an statistical test based on the distance between empirical mean trees, as an analog of the two sample z statistic for comparing two means. Despite its simplicity, we can report that the test is quite powerful to separate distributions with different means but it does not distinguish between different populations with the same mean, a more complicated test should be applied in that setting. The performance of the test is studied via simulations on Galton-Watson branching processes. We also show an application to a real data problem in genomics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Algorithms and Data Compression · Stochastic processes and statistical mechanics