Guide to k-mer approaches for genomics across the tree of life
Katharine M. Jenike, Luc\'ia Campos-Dom\'inguez, Marilou Bodd\'e,, Jos\'e Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron

TL;DR
This paper provides a comprehensive review of k-mer-based methods in biodiversity genomics, highlighting their theoretical foundations and practical applications for analyzing diverse genomic data.
Contribution
It offers the first detailed overview of k-mer approaches, serving as a reference manual for their use in genomics across the tree of life.
Findings
K-mer methods enable rapid analysis of complex sequencing datasets.
They are effective in overcoming technical and biological challenges in genome sequencing.
The review consolidates theoretical and practical insights for biodiversity genomics.
Abstract
The wide array of currently available genomes display a wonderful diversity in size, composition and structure with many more to come thanks to several global biodiversity genomics initiatives starting in recent years. However, sequencing of genomes, even with all the recent advances, can still be challenging for both technical (e.g. small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g. germline restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years, k-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analysed sequences (e.g. raw reads or genomes) into a set of sub-sequences of length k, called k-mers. Despite this apparent simplicity, k-mer-based analysis allows for a rapid and intuitive assessment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research · Gene expression and cancer classification
MethodsSparse Evolutionary Training
