TL;DR
Mycorrhiza is a machine learning method that uses phylogenetic networks and random forests to improve genotype assignment accuracy across diverse datasets, especially when traditional assumptions are violated.
Contribution
It introduces a novel phylogenetic network-based feature engineering approach for genotype assignment, outperforming existing methods in accuracy and robustness.
Findings
Outperforms STRUCTURE and Admixture in accuracy.
Provides good estimates of mixture proportions.
Excels on datasets with high FST or deviation from Hardy-Weinberg equilibrium.
Abstract
Motivation The genotype assignment problem consists of predicting, from the genotype of an individual, which of a known set of populations it originated from. The problem arises in a variety of contexts, including wildlife forensics, invasive species detection and biodiversity monitoring. Existing approaches perform well under ideal conditions but are sensitive to a variety of common violations of the assumptions they rely on. Results In this article, we introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Those features are then used as input to a Random Forests classifier. The classification accuracy was assessed on multiple published empirical SNP, microsatellite or consensus sequence datasets with wide ranges of size,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
