FermiKit: assembly-based variant calling for Illumina resequencing data
Heng Li

TL;DR
FermiKit is an assembly-based pipeline for variant calling in Illumina resequencing data that assembles reads de novo and maps the assembly to identify SNPs, INDELs, and SVs efficiently and accurately.
Contribution
It introduces a novel assembly-based approach for variant calling that maintains high accuracy while reducing raw data complexity.
Findings
Assembles 30-fold human genome in about one day on a 16-core server.
Calls variants in half an hour with accuracy comparable to existing methods.
Retains most original information despite data reduction.
Abstract
Summary: FermiKit is a variant calling pipeline for Illumina data. It de novo assembles short reads and then maps the assembly against a reference genome to call SNPs, short insertions/deletions (INDELs) and structural variations (SVs). FermiKit takes about one day to assemble 30-fold human whole-genome data on a modern 16-core server with 85GB RAM at the peak, and calls variants in half an hour to an accuracy comparable to the current practice. FermiKit assembly is a reduced representation of raw data while retaining most of the original information. Availability and implementation: https://github.com/lh3/fermikit Contact: [email protected]
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genomics and Rare Diseases · RNA and protein synthesis mechanisms
