On the Coverage Required for Diploid Genome Assembly

Daanish Mahajan; Chirag Jain; Navin Kashyap

arXiv:2405.05734·cs.IT·April 8, 2025

On the Coverage Required for Diploid Genome Assembly

Daanish Mahajan, Chirag Jain, Navin Kashyap

PDF

Open Access

TL;DR

This paper explores the theoretical coverage and read length requirements for complete diploid genome assembly, revealing that practical algorithms need significantly higher coverage than the theoretical minimum due to repeat bridging challenges.

Contribution

It provides the first information-theoretic analysis of coverage needs for diploid genome assembly and evaluates the limitations of common assembly algorithms.

Findings

01

Assembly algorithms require higher coverage than the theoretical lower bound.

02

Double repeats in the genome pose significant challenges for assembly.

03

Necessary conditions for overlap graph-based assembly are derived.

Abstract

The repeat content and heterozygosity rate of a target genome are important factors in determining the feasibility of achieving a complete telomere-to-telomere assembly. The mathematical relationship between the required coverage and read length for the purpose of unique reconstruction remains unexplored for diploid genomes. We investigate the information-theoretic conditions that the given set of sequencing reads must satisfy to achieve the complete reconstruction of the true sequence of a diploid genome. We also analyze the standard greedy and de-Bruijn graph-based assembly algorithms. Our results show that the coverage and read length requirements of the assembly algorithms are considerably higher than the lower bound because both algorithms require the double repeats in the genome to be bridged. Finally, we derive the necessary conditions for the overlap graph-based assembly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChromosomal and Genetic Variations · Evolutionary Algorithms and Applications · DNA and Biological Computing