Faster Iterative $\phi$ Queries on the Positional BWT
Paola Bonizzoni, Travis Gagie, Younan Gao

TL;DR
This paper introduces a new decomposition scheme for the Positional BWT that enables faster iterative $phi$ queries, crucial for haplotype analysis, with improved space-time tradeoffs over previous methods.
Contribution
It proposes a novel segmentation approach that reduces query time and space complexity for iterative $phi$ operations on the PBWT, enhancing genomic data analysis efficiency.
Findings
Supports $k$ iterative $phi$ queries in $O( ext{log log}_w ext{min}(m,h) + k)$ time with $O(( ilde{r}+h) ext{log} n)$ bits.
Provides a more space-efficient structure using $O( ilde{r} ext{log} h + h ext{log} n)$ bits, with $O(k ext{log log}_w h)$ query time.
Expected to be practical for genomic datasets where haplotypes are fewer than sites.
Abstract
The Positional Burrows-Wheeler Transform (PBWT) is a fundamental data structure for the efficient representation and analysis of large-scale haplotype panels. For a panel of sequences over sites, a key operation is the query, which returns the haplotype index immediately preceding in co-lexicographic order at site . Efficient support for iterative queries is essential for haplotype matching and variation analysis. In this work, we introduce a simple and novel decomposition scheme that decomposes each haplotype row into sub-intervals, called refined segments, within which a haplotype's co-lexicographic predecessor for the sites remains unchanged. We show that refined segments satisfy two key properties: (i) each segment associated with overlaps with at most a constant number of segments of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
