Protein-to-genome alignment with miniprot
Heng Li

TL;DR
Miniprot is a new, highly efficient protein-to-genome aligner that leverages recent computational techniques to significantly outperform existing tools in speed while maintaining comparable accuracy.
Contribution
The paper introduces miniprot, a novel protein-to-genome aligner that incorporates recent advances like k-mer sketch and SIMD-based dynamic programming for improved performance.
Findings
Miniprot is tens of times faster than existing tools.
Miniprot achieves comparable accuracy to older aligners.
The tool is suitable for annotating genes in non-model organisms.
Abstract
Motivation: Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over ten years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. Results: Here we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and SIMD-based dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. Availability and implementation: https://github.com/lh3/miniprot
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Glycosylation and Glycoproteins Research · Machine Learning in Bioinformatics
