Fast Iteration of Spaced k-mers

Lucas Czech

arXiv:2603.25417·q-bio.GN·May 15, 2026

Fast Iteration of Spaced k-mers

Lucas Czech

PDF

1 Repo

TL;DR

This paper introduces highly efficient algorithms for extracting spaced k-mers from nucleotide sequences, significantly improving speed and performance in bioinformatics applications through CPU-level bit manipulation techniques.

Contribution

The authors develop optimized, hardware-aware algorithms for spaced k-mer extraction that outperform existing methods by up to an order of magnitude.

Findings

01

Algorithms achieve up to 750MB/sec throughput per core.

02

Implementation is simple, fast, and publicly available.

03

Addresses common inefficiencies in k-mer processing.

Abstract

Background: Short sequence substrings of a fixed length k, called k-mers, are a ubiquitous computational primitive in bioinformatics, used across sequence indexing, read mapping, genome assembly, metagenomic classification, and comparative genomics. Spaced k-mers generalize this concept by selecting only a subset of positions within a k-mer, improving robustness to mismatches and sequencing errors. While k-mers are computationally highly efficient, spaced k-mers require additional work to be extracted from a sequence, which has slowed down existing methods. Results: We present a collection of efficient algorithms for extracting spaced k-mers from nucleotide sequences, optimized for different hardware architectures. They are based on bit manipulation instructions at CPU level, making them both simpler to implement and up to an order of magnitude faster than existing methods. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lczech/fisk
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.