Filtering for truth: high-precision taxonomic classification in nanopore shotgun metagenomics data through a KMA-based bioinformatic pipeline (KAPTAIN)
Alexander Van Uffelen, Andrea Gobbo, Marie-Alice Fraiture, Andrés Posadas, Nancy H. C. Roosens, Kathleen Marchal, Sigrid C. J. De Keersmaecker, Kevin Vanneste

TL;DR
This paper introduces KAPTAIN, a new pipeline for nanopore metagenomics that improves species-level classification accuracy by optimizing filtering thresholds and using longer reads.
Contribution
The novel contribution is an optimized taxonomic classification pipeline for nanopore data that achieves high precision while maintaining recall through KMA-based classification and threshold optimization.
Findings
The KAPTAIN pipeline achieves up to 95% median precision with 91.62% recall at 1000M bases of sequencing yield.
Higher sequencing yields significantly improve classification accuracy and lower the limit of detection to 0.1%.
Validation on probiotic-derived mock communities confirmed the pipeline's performance and general applicability.
Abstract
Shotgun metagenomics enables to study microbial communities without biases from culturing and isolation, but taxonomic classification to the species level remains challenging due to high false positive rates. Oxford Nanopore Technologies offers new opportunities to address these challenges by producing longer reads. However, different pipelines and tools use different methods to reduce false positives, resulting in variable outcomes with limited exploration of what works best in practice. Relative abundance filtering is often used to improve precision by removing false positives but reduces also recall by removing true positives. In this study, we optimized a broadly applicable taxonomic classification pipeline for long-read nanopore sequencing data that improves precision. The pipeline uses the tool KMA as the underlying classifier, followed by specific post-processing and optimization…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Microbial Community Ecology and Physiology · Gut microbiota and health
