diagno-syst: a tool for accurate inventories in metabarcoding
J.-M. Frigerio, F. Rimet, A. Bouchez, E. Chancerel, P. Chaumeil, F., Salin, S. Th\'erond, M. Kahlert, A. Franc

TL;DR
This paper introduces diagno-syst, a new algorithm for metabarcoding inventory of freshwater diatoms, emphasizing exact read clustering to improve accuracy over heuristic methods, especially for detecting rare species.
Contribution
The study presents diagno-syst, a supervised clustering algorithm that enhances molecular inventories by using exact read mapping, addressing limitations of heuristic pipelines.
Findings
Exact calculations improve accuracy over heuristics.
Heavy computation load is necessary for precise results.
Method benefits biodiversity studies by reducing false positives.
Abstract
Metabarcoding on amplicons is rapidly expanding as a method to produce molecular based inventories of microbial communities. Here, we work on freshwater diatoms, which are microalgae possibly inventoried both on a morphological and a molecular basis. We have developed an algorithm, in a program called diagno-syst, based a the notion of informative read, which carries out supervised clustering of reads by mapping them exactly one by one on all reads of a well curated and taxonomically annotated reference database. This program has been run on a HPC (and HTC) infrastructure to address computation load. We compare optical and molecular based inventories on 10 samples from L\'eman lake, and 30 from Swedish rivers. We track all possibilities of mismatches between both approaches, and compare the results with standard pipelines (with heuristics) like Mothur. We find that the comparison with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMicrobial Community Ecology and Physiology · Protist diversity and phylogeny · Environmental DNA in Biodiversity Studies
