# Comparison of short-read and long-read metagenome assemblies in a natural soil community highlights systematic bias in recovery of high-diversity populations

**Authors:** Maureen Berg, Taylor Reiter, Joanne Emerson, C Titus Brown, Simon Roux

PMC · DOI: 10.1093/nargab/lqaf163 · NAR Genomics and Bioinformatics · 2025-11-21

## TL;DR

This study compares short-read and long-read sequencing in soil metagenomes, finding that long-reads better capture diverse and complex genomic regions missed by short-reads.

## Contribution

The study identifies low coverage and high diversity as key factors causing misassemblies in short-read metagenomes and highlights the complementary role of long-read sequencing.

## Key findings

- Short-read assemblies miss variable genomic regions like viral integrations due to low coverage and high diversity.
- Long-read sequencing improves contiguity and recovery of complex genome regions in soil metagenomes.
- Missed regions in short-read assemblies are often functionally important and diverse.

## Abstract

Comparisons of long-read and short-read (meta)genome assemblies typically show that short-read sequence assemblies are less error-prone, but struggle to assemble complicated genome regions (e.g. repeats) compared to long-read sequence assemblies. Accurate metagenome assembly is especially challenging in diverse environments, such as soil, and long-read sequencing has been shown to improve assembly. Here, we use metagenomic data with paired long-read and short-read sequences to identify specific factors that impact genome assembly and assess their relative importance in a natural soil community. Our analysis suggests that low coverage and high sequence diversity are the two main factors leading to misassemblies in short-read data, and many of these “missed” regions tend to be variable parts of the genome, such as integrated viruses or defense system islands. Taken together, our results demonstrate that short-read metagenomes can possibly underestimate the diversity of these genome regions and that long-read sequencing can complement short-read metagenomes by improving assembly contiguity and the recovery of variable regions.

## Full-text entities

- **Diseases:** SR (MESH:D004410)
- **Chemicals:** MAG (-)
- **Cell lines:** AC02 — Homo sapiens (Human), Transformed cell line (CVCL_HA69)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12634412/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12634412/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC12634412/full.md

---
Source: https://tomesphere.com/paper/PMC12634412