More efficient PBWT prefix-array access via batching
Travis Gagie

TL;DR
This paper introduces a batching approach to improve the efficiency of accessing the PBWT prefix-array, enabling faster haplotype matching in genetic data analysis.
Contribution
It proposes a novel batching method that reduces space and time complexity for PBWT prefix-array access during haplotype matching.
Findings
Batching queries can significantly reduce space and time complexity.
The method achieves constant time haplotype reporting per substring.
It improves upon recent time-space tradeoffs in PBWT query processing.
Abstract
The positional Burrows-Wheeler Transform (PBWT) is commonly used to store haplotype panels compactly in such a way that, given a query haplotype, we can quickly find the set maximal exact matches (SMEMs) between the query and the haplotypes in a panel. There are generally two steps in this process: first we find the maximal substrings of the query that occur in the same positions in haplotypes in the panel and then, for each such substring, report the haplotypes in the panel in which the substring occurs in the same position as in the query. Very recently, Bonizzoni, Gagie and Gao (2026) gave two time-space tradeoffs for the second step: they use either bits and time to report haplotypes in the panel, or bits and time, where is the number of runs in the panel's PBWT and ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
