# Efficient Recovery of Complete Gut Viral Genomes by Combined Short‐ and Long‐Read Sequencing

**Authors:** Jingchao Chen, Chuqing Sun, Yanqi Dong, Menglu Jin, Senying Lai, Longhao Jia, Xueyang Zhao, Huarui Wang, Na L. Gao, Peer Bork, Zhi Liu, Wei‐Hua Chen, Xing‐Ming Zhao

PMC · DOI: 10.1002/advs.202305818 · Advanced Science · 2024-01-19

## TL;DR

This study improves gut virome research by assembling a large catalog of nearly complete viral genomes using a combination of short- and long-read sequencing.

## Contribution

A new method combining VLP enrichment and hybrid sequencing significantly increases the completeness and diversity of gut viral genome assemblies.

## Key findings

- The CHGV catalog contains 21,499 non-redundant viral genomes, 35% of which are complete.
- 60% of the CHGV vOTUs were obtained using long-read or hybrid assemblies, with little overlap with short-read-only assemblies.
- The catalog includes 32% novel viral genomes and identifies phages more prevalent than crAssphages and Gubaphages.

## Abstract

Current metagenome assembled human gut phage catalogs contained mostly fragmented genomes. Here, comprehensive gut virome detection procedure is developed involving virus‐like particle (VLP) enrichment from ≈500 g feces and combined sequencing of short‐ and long‐read. Applied to 135 samples, a Chinese Gut Virome Catalog (CHGV) is assembled consisting of 21,499 non‐redundant viral operational taxonomic units (vOTUs) that are significantly longer than those obtained by short‐read sequencing and contained ≈35% (7675) complete genomes, which is ≈nine times more than those in the Gut Virome Database (GVD, ≈4%, 1,443). Interestingly, the majority (≈60%, 13,356) of the CHGV vOTUs are obtained by either long‐read or hybrid assemblies, with little overlap with those assembled from only the short‐read data. With this dataset, vast diversity of the gut virome is elucidated, including the identification of 32% (6,962) novel vOTUs compare to public gut virome databases, dozens of phages that are more prevalent than the crAssphages and/or Gubaphages, and several viral clades that are more diverse than the two. Finally, the functional capacities are also characterized of the CHGV encoded proteins and constructed a viral‐host interaction network to facilitate future research and applications.

The CHGV, a human gut virome database, comprises 21,499 non‐redundant phage genomes obtaine through viral‐like particle enrichment and combined short‐ and long‐read sequencing. The long‐reads facilitate the identification of more complete viral genomes and surprisingly favor lowly abundant ones. The CHGV features novel genomes, prevalent phages exceeding crAssphages/Gubaphages, and enables functional analysis and viral‐host interaction network construction. This enhances gut virome research and applications significantly.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10987132/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10987132/full.md

## References

92 references — full list in the complete paper: https://tomesphere.com/paper/PMC10987132/full.md

---
Source: https://tomesphere.com/paper/PMC10987132