More efficient PBWT prefix-array access via batching

Travis Gagie

arXiv:2605.15819·cs.DS·May 18, 2026

More efficient PBWT prefix-array access via batching

Travis Gagie

PDF

TL;DR

This paper introduces a batching approach to improve the efficiency of accessing the PBWT prefix-array, enabling faster haplotype matching in genetic data analysis.

Contribution

It proposes a novel batching method that reduces space and time complexity for PBWT prefix-array access during haplotype matching.

Findings

01

Batching queries can significantly reduce space and time complexity.

02

The method achieves constant time haplotype reporting per substring.

03

It improves upon recent time-space tradeoffs in PBWT query processing.

Abstract

The positional Burrows-Wheeler Transform (PBWT) is commonly used to store haplotype panels compactly in such a way that, given a query haplotype, we can quickly find the set maximal exact matches (SMEMs) between the query and the haplotypes in a panel. There are generally two steps in this process: first we find the maximal substrings of the query that occur in the same positions in haplotypes in the panel and then, for each such substring, report the haplotypes in the panel in which the substring occurs in the same position as in the query. Very recently, Bonizzoni, Gagie and Gao (2026) gave two time-space tradeoffs for the second step: they use either $O ((r + h) lo g n)$ bits and $O (lo g lo g min (h, ℓ) + k)$ time to report $k$ haplotypes in the panel, or $O (r lo g h + h lo g n)$ bits and $O (k lo g lo g h)$ time, where $r$ is the number of runs in the panel's PBWT and $h$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.