PhyloGen: Language Model-Enhanced Phylogenetic Inference via Graph Structure Generation
ChenRui Duan, Zelin Zang, Siyuan Li, Yongjie Xu, Stan Z. Li

TL;DR
PhyloGen introduces a novel approach that uses a pre-trained language model to generate and optimize phylogenetic trees, improving accuracy and efficiency over traditional methods by jointly modeling topology and branch lengths.
Contribution
It presents a new method that leverages language models for phylogenetic inference, avoiding reliance on evolutionary models and pre-generated topologies, and jointly optimizing tree structures and branch lengths.
Findings
Effective on eight real-world datasets
Provides deeper insights into phylogenetic relationships
Outperforms traditional inference methods
Abstract
Phylogenetic trees elucidate evolutionary relationships among species, but phylogenetic inference remains challenging due to the complexity of combining continuous (branch lengths) and discrete parameters (tree topology). Traditional Markov Chain Monte Carlo methods face slow convergence and computational burdens. Existing Variational Inference methods, which require pre-generated topologies and typically treat tree structures and branch lengths independently, may overlook critical sequence features, limiting their accuracy and flexibility. We propose PhyloGen, a novel method leveraging a pre-trained genomic language model to generate and optimize phylogenetic trees without dependence on evolutionary models or aligned sequence constraints. PhyloGen views phylogenetic inference as a conditionally constrained tree structure generation problem, jointly optimizing tree topology and branch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
MethodsVariational Inference
