Inferring taxonomic placement from DNA barcoding allowing discovery of new taxa
Alessandro Zito, Tommaso Rigon, David B. Dunson

TL;DR
This paper introduces BayesANT, a Bayesian nonparametric classifier for DNA barcoding data that effectively predicts taxonomic placement and discovers new taxa despite limited sequence data and unknown organisms.
Contribution
The paper presents a novel Bayesian nonparametric approach, BayesANT, for taxonomic classification that accommodates new taxa discovery and handles incomplete reference databases.
Findings
High accuracy in real data tests
Effective in identifying unknown taxa
Handles limited sequence regions
Abstract
In ecology it has become common to apply DNA barcoding to biological samples leading to datasets containing a large number of nucleotide sequences. The focus is then on inferring the taxonomic placement of each of these sequences by leveraging on existing databases containing reference sequences having known taxa. This is highly challenging because i) sequencing is typically only available for a relatively small region of the genome due to cost considerations; ii) many of the sequences are from organisms that are either unknown to science or for which there are no reference sequences available. These issues can lead to substantial classification uncertainty, particularly in inferring new taxa. To address these challenges, we propose a new class of Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow new taxa to be discovered at each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIdentification and Quantification in Food · Genomics and Phylogenetic Studies · Genetic diversity and population structure
