LV Barcoding: locality sensitive hashing-based tool for rapid species identification in DNA barcoding
Long Fan, Ka Hou Chu

TL;DR
LV Barcoding is a novel, rapid, and accurate DNA species identification tool that uses locality sensitive hashing, outperforming BLAST in speed and accuracy on large reference databases.
Contribution
Introduces LV Barcoding, combining locality sensitive hashing and VIP Barcoding, for fast and accurate species identification in DNA barcoding datasets.
Findings
LV Barcoding outperforms BLAST in accuracy.
LV Barcoding matches ~114,000 barcodes within 10 seconds.
The tool is available for public use.
Abstract
DNA barcoding has emerged as a cost-effective approach for species identification. However, the scarcity of tools used for searching the booming reference database becomes an obstacle, currently with BLAST as the only practical choice. Here, we propose a program - LV Barcoding - based on both the random hyperplane projection-based locality sensitive hashing method and the composition vector-based VIP Barcoding for fast species identification. The performance of LV Barcoding is assessed on the data release of BOLD. LV Barcoding has higher accuracy than BLAST, and is able to match a single query against ~114,000 reference barcodes within 10 seconds on a desktop computer. This program is available at http://msl.sls.cuhk.edu.hk/vipbarcoding/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIdentification and Quantification in Food · Environmental DNA in Biodiversity Studies · Genomics and Phylogenetic Studies
