TL;DR
Minimap2 is a versatile, fast, and accurate alignment tool designed for mapping various types of long and short nucleotide sequences against large reference genomes, outperforming existing aligners in speed and accuracy.
Contribution
It introduces a new alignment algorithm capable of handling ultra-long reads, noisy data, and large genomes efficiently, with novel heuristics to improve accuracy and reduce spurious matches.
Findings
3-4 times faster than mainstream short-read mappers
Over 30 times faster at higher accuracy for long reads
Effective for diverse sequencing data types
Abstract
Motivation: Recent advances in sequencing technologies promise ultra-long reads of 100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of 100bp in length, 1kb genomic reads at error rate 15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
