# AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline

**Authors:** Chao Liu, Pei Wu, Xue Wu, Xia Zhao, Fang Chen, Xiaofang Cheng, Hongmei Zhu, Ou Wang, Mengyang Xu

PMC · DOI: 10.3389/fgene.2024.1421565 · Frontiers in Genetics · 2024-07-26

## TL;DR

AsmMix is a new pipeline that combines different sequencing methods to create accurate and detailed diploid genome assemblies.

## Contribution

AsmMix introduces a novel pipeline that integrates co-barcoded and long-read data to improve haplotype-resolved genome assembly accuracy and contiguity.

## Key findings

- AsmMix achieves high precision and recall rates for haplotyping across various sequencing platforms and conditions.
- The pipeline produces highly contiguous and accurate assemblies validated using the GIAB benchmarks on a human genome dataset.

## Abstract

Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11310137/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11310137/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC11310137/full.md

---
Source: https://tomesphere.com/paper/PMC11310137