# RAGA: a reference-assisted genome assembly tool for efficient population-scale assembly

**Authors:** Ru-Peng Zhao, Yu-Hong Luo, Wen-Zhao Xie, Zu-Wen Zhou, Yong-Qing Qian, Si-Long Yuan, Dong-Ao Li, Jiana Li, Kun Lu, Xingtan Zhang, Jia-Ming Song, Ling-Ling Chen

PMC · DOI: 10.1093/hr/uhaf207 · Horticulture Research · 2025-08-11

## TL;DR

RAGA is a new tool that improves genome assembly by combining reference genomes and high-quality reads, making large-scale genomic studies more efficient.

## Contribution

RAGA introduces a hybrid computational method that integrates de novo and reference-based assembly for population-scale genome studies.

## Key findings

- RAGA reduces the number of contigs and gaps in genome assemblies.
- The tool corrects genome assembly errors and improves quality across plant genomes.
- RAGA streamlines population-scale assembly workflows for pan-genomic research.

## Abstract

High-quality reference genomes at the population scale are fundamental for advancing pan-genomic research. However, high-quality genome assembly at the population scale is costly and time-consuming. To overcome these limitations, we developed Reference-Assisted Genome Assembly (RAGA), a hybrid computational tool that combines de novo and reference-based assembly approaches. RAGA efficiently employs existing reference genomes from the same or closely related species in combination with PacBio HiFi reads to produce high-quality alternative long sequences. These sequences can be integrated with de novo assemblies to improve assembly quality across population-scale datasets. The performance of RAGA across various plant genomes demonstrated its ability to reduce the number of contigs, decrease gaps, and correct genome assembly errors. The implementation of RAGA (available at https://github.com/wzxie/RAGA) significantly streamlines population-scale genome assembly workflows, providing a robust foundation for comprehensive pan-genomic investigations. This tool represents a substantial advancement in making large-scale genomic studies more accessible and efficient.

## Full-text entities

- **Chemicals:** RAGA (-)
- **Species:** Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702], Glycine max (soybean, species) [taxon 3847], Echinochloa colona (corn panic grass, species) [taxon 90396], Citrus australis (Australian lime, species) [taxon 341934], Manihot esculenta (cassava, species) [taxon 3983], Phaseolus vulgaris (common bean, species) [taxon 3885], Fragaria x ananassa (strawberry, species) [taxon 3747], watermelon [taxon 260674], Musa acuminata (banana, species) [taxon 4641], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Tripidium rufipilum (species) [taxon 908848], Homo sapiens (human, species) [taxon 9606], Pyrus communis (pear, species) [taxon 23211], Saccharum spontaneum (fodder cane, species) [taxon 62335], Oryza sativa (Asian cultivated rice, species) [taxon 4530], C. australis [taxon 54191], Citrus x limon (lemon, species) [taxon 2708], Euphorbia peplus (petty spurge, species) [taxon 38846], Triticum aestivum (bread wheat, species) [taxon 4565]
- **Mutations:** T2T
- **Cell lines:** MH63 — Mus musculus (Mouse), Hybridoma (CVCL_J223)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12577851/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12577851/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/PMC12577851/full.md

---
Source: https://tomesphere.com/paper/PMC12577851