Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly
Heng Li

TL;DR
This paper introduces a de novo assembler, fermi, that constructs unitigs from short reads for SNP and INDEL calling, demonstrating comparable or improved accuracy over standard pipelines in human resequencing.
Contribution
The study presents a novel assembler, fermi, and methods like FMD-index for efficient de novo assembly and variant calling, offering a promising alternative to traditional pipelines.
Findings
Comparable SNP calling accuracy to standard methods
Improved INDEL detection sensitivity
Higher sensitivity than other de novo assembly approaches
Abstract
Motivation: Eugene Myers in his string graph paper (Myers, 2005) suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. Results: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
