Sparsification of Large Ultrametric Matrices: Insights into the Microbial Tree of Life
Evan D. Gorman, Manuel E. Lladser

TL;DR
This paper introduces a wavelet-based sparsification method for large ultrametric matrices, enabling efficient storage and analysis of microbial phylogenetic data, with applications in metagenomics and biodiversity metrics.
Contribution
It develops a novel sparsification technique exploiting tree structures in ultrametric matrices and provides algorithms for matrix compression and spectral approximation.
Findings
Most off-diagonal entries become zero after transformation
Efficient algorithms for matrix compression from tree data
Application to microbial diversity analysis and phylogenetic insights
Abstract
Ultrametric matrices have a rich structure that is not apparent from their definition. Notably, the subclass of strictly ultrametric matrices are covariance matrices of certain weighted rooted binary trees. In applications, these matrices can be large and dense, making them difficult to store and handle. In this manuscript, we exploit the underlying tree structure of these matrices to sparsify them via a similarity transformation based on Haar-like wavelets. We show that, with overwhelmingly high probability, only an asymptotically negligible fraction of the off-diagonal entries in random but large strictly ultrametric matrices remain non-zero after the transformation; and develop a fast algorithm to compress such matrices directly from their tree representation. We also identify the subclass of matrices diagonalized by the wavelets and supply a sufficient condition to approximate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
