TL;DR
EMDUnifrac is a novel algorithm that computes the Unifrac distance in linear time and identifies specific organisms responsible for differences between microbial communities, enhancing interpretability.
Contribution
The paper introduces EMDUnifrac, an exact linear-time algorithm that computes Unifrac distances and pinpoints responsible taxa, improving efficiency and interpretability over previous methods.
Findings
Computes Unifrac distance in linear time and space.
Identifies operational taxonomic units responsible for differences.
Applicable to various community profiling data.
Abstract
Both the weighted and unweighted Unifrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the Unifrac metric is equivalent to the so-called earth mover's distance (also known as the Kantorovich-Rubinstein metric) to develop an algorithm that not only computes the Unifrac distance in linear time and space, but also simultaneously finds which operational taxonomic units are responsible for the observed differences between samples. This allows the algorithm, called EMDUnifrac, to determine why given samples are different, not just if they are different, and with no added computational burden. EMDUnifrac can be utilized on any distribution on a tree, and so is particularly suitable to analyzing both operational taxonomic units derived from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
