Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models
Zhenyu Zhang, Akihiko Nishimura, Paul Bastide, Xiang Ji, Rebecca P., Payne, Philip Goulder, Philippe Lemey, Marc A. Suchard

TL;DR
This paper introduces a scalable Bayesian inference method for estimating correlations among mixed-type biological traits using phylogenetic multivariate probit models, enabling analysis of large datasets with high-dimensional truncated normal distributions.
Contribution
It develops a novel inference approach combining the bouncy particle sampler with dynamic programming to efficiently handle high-dimensional truncated normals in phylogenetic models.
Findings
Successfully applied to 535 HIV viruses with 24 traits
Estimated trait correlations and identified influencing factors
Achieved linear computational complexity in large-scale data
Abstract
Inferring concerted changes among biological traits along an evolutionary history remains an important yet challenging problem. Besides adjusting for spurious correlation induced from the shared history, the task also requires sufficient flexibility and computational efficiency to incorporate multiple continuous and discrete traits as data size increases. To accomplish this, we jointly model mixed-type traits by assuming latent parameters for binary outcome dimensions at the tips of an unknown tree informed by molecular sequences. This gives rise to a phylogenetic multivariate probit model. With large sample sizes, posterior computation under this model is problematic, as it requires repeated sampling from a high-dimensional truncated normal distribution. Current best practices employ multiple-try rejection sampling that suffers from slow-mixing and a computational cost that scales…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Genomics and Phylogenetic Studies · Genetic diversity and population structure
