# Toward a Kinh Vietnamese Reference Genome: Constructing a De Novo Genome Assembly Using Long-Read Sequencing and Optical Mapping

**Authors:** Le Thi Dung, Le Tung Lam, Nguyen Hong Trang, Nguyen Vu Hung Anh, Nguyen Ngoc Nam, Doan Thi Nhung, Tran Huyen Linh, Le Ngoc Giang, Hoang Ha, Nguyen Quang Huy, Truong Nam Hai

PMC · DOI: 10.3390/genes16050536 · Genes · 2025-04-29

## TL;DR

This study creates a high-quality reference genome for the Kinh Vietnamese population using advanced sequencing techniques, improving genomic accuracy for this group.

## Contribution

The paper introduces a new Kinh Vietnamese reference genome (VHG1.2) using long-read and optical mapping technologies.

## Key findings

- The VHG1.2 assembly has high accuracy (QV: 48), completeness (BUSCO: 92%), and continuity (N50: 50 Kbp).
- Using VHG1.2 revealed significant genetic variants compared to the standard hg38 reference genome.
- The hybrid sequencing approach proved effective for de novo assembly of population-specific genomes.

## Abstract

Background: Population-specific reference genomes are essential for improving the accuracy and reliability of genomic analyses across diverse human populations. Although Vietnam ranks as the 16th most populous country in the world, with more than 86% of its population identifying as Kinh, studies specifically focusing on the Kinh Vietnamese reference genome remain scarce. Therefore, constructing a Kinh Vietnamese reference genome is valuable in the genetic research of Vietnamese. Methods: In this study, we combined PacBio long-read sequencing and Bionano optical mapping data to generate a de novo assembly of a Kinh Vietnamese genome (VHG), which was subsequently polished using multiple Kinh Vietnamese short-read whole-genome sequences (WGSs). Results: The final assembly, named VHG1.2, comprised 3.22 gigabase pairs of high-quality sequence data, demonstrating high accuracy (QV: 48), completeness (BUSCO: 92%), and continuity (295 super scaffolds, super scaffold N50: 50 Kbp). Using multiple bioinformatic tools for variant calling, we observed significant variants when the population-specific reference VHG1.2 was used compared to the standard reference genome hg38. Conclusions: Overall, our genome assembly demonstrates the advantages of a long-read hybrid sequencing approach for de novo assembly and highlights the benefit of using population-specific reference genomes in population genomic analysis.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12111184/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12111184/full.md

## References

66 references — full list in the complete paper: https://tomesphere.com/paper/PMC12111184/full.md

---
Source: https://tomesphere.com/paper/PMC12111184