# NanoMGT: Marker gene typing of low complexity mono-species metagenomic samples using noisy long reads

**Authors:** Malte B Hallgren, Philip T L C Clausen, Frank M Aarestrup

PMC · DOI: 10.1093/biomethods/bpae057 · Biology Methods & Protocols · 2024-08-06

## TL;DR

NanoMGT is a tool that improves the identification of microbial strain diversity in metagenomic samples using noisy long-read data.

## Contribution

NanoMGT introduces a novel scoring system for accurate marker gene typing in low-complexity mono-species samples.

## Key findings

- NanoMGT outperforms existing tools in detecting strain-specific marker genes in noisy long-read data.
- The tool effectively balances sensitivity and precision by rewarding co-occurring mutations and penalizing clustered errors.
- NanoMGT shows potential as a post-binning tool for more accurate allele determination in microbial communities.

## Abstract

Rapid advancements in sequencing technologies have led to significant progress in microbial genomics, yet challenges persist in accurately identifying microbial strain diversity in metagenomic samples, especially when working with noisy long-read data from platforms like Oxford Nanopore Technologies (ONT). In this article, we introduce NanoMGT, a tool designed to enhance marker gene typing in low-complexity mono-species samples, leveraging the unique properties of long reads. NanoMGT excels in its ability to accurately identify mutations amidst high error rates, ensuring the reliable detection of multiple strain-specific marker genes. Our tool implements a novel scoring system that rewards mutations co-occurring across different reads and penalizes densely grouped, likely erroneous variants, thereby achieving a good balance between sensitivity and precision. A comparative evaluation of NanoMGT, using a simulated multi-strain sample of seven bacterial species, demonstrated superior performance relative to existing tools and the advantages of using a threshold-based filtering approach to calling minority variants in ONT’s sequencing data. NanoMGT’s potential as a post-binning tool in metagenomic pipelines is particularly notable, enabling researchers to more accurately determine specific alleles and understand strain diversity in microbial communities. Our findings have significant implications for clinical diagnostics, environmental microbiology, and the broader field of genomics. The findings offer a reliable and efficient approach to marker gene typing in complex metagenomic samples.

## Full-text entities

- **Diseases:** ONT (MESH:C000719218), Hallgren (MESH:D052245), infection (MESH:D007239)
- **Chemicals:** LongShot (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11387619/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11387619/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC11387619/full.md

---
Source: https://tomesphere.com/paper/PMC11387619