Fast Characterization of Segmental Duplications in Genome Assemblies
Ibrahim Numanagi\'c, Alim S. G\"okkaya, Lillian Zhang, Bonnie Berger,, Can Alkan, Faraz Hach

TL;DR
SEDEF is a fast, accurate tool for identifying segmental duplications in genome assemblies, improving upon previous methods by capturing more errors and significantly reducing analysis time.
Contribution
We developed SEDEF, a novel algorithm that rapidly and accurately detects segmental duplications in genome assemblies using advanced filtering and chaining techniques.
Findings
SEDEF detects SDs with high accuracy and speed.
It captures up to 25% pairwise errors, enhancing evolutionary analysis.
SEDEF runs in minutes, unlike previous tools taking weeks.
Abstract
Segmental duplications (SDs), or low-copy repeats (LCR), are segments of DNA greater than 1 Kbp with high sequence identity that are copied to other regions of the genome. SDs are among the most important sources of evolution, a common cause of genomic structural variation, and several are associated with diseases of genomic origin. Despite their functional importance, SDs present one of the major hurdles for de novo genome assembly due to the ambiguity they cause in building and traversing both state-of-the-art overlap-layout-consensus and de Bruijn graphs. This causes SD regions to be misassembled, collapsed into a unique representation, or completely missing from assembled reference genomes for various organisms. In turn, this missing or incorrect information limits our ability to fully understand the evolution and the architecture of the genomes. Despite the essential need to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
