Efficient Genomic Interval Queries Using Augmented Range Trees
Chengsheng Mao, Alal Eran, Yuan Luo

TL;DR
This paper introduces a novel augmented range tree data structure that significantly improves the efficiency of querying genomic intervals based on all Allen's interval relations, facilitating large-scale genome annotation tasks.
Contribution
The authors develop and compare two range tree-based methods, including an augmented range tree with fractional cascading, to efficiently support all Allen's interval relations in genomic data queries.
Findings
RTFC achieves the best query time complexity among the three methods.
2D-RT outperforms traditional interval trees in most relation queries.
RTFC is highly effective for large-scale genomic interval analysis.
Abstract
Efficient large-scale annotation of genomic intervals is essential for personal genome interpretation in the realm of precision medicine. There are 13 possible relations between two intervals according to Allen's interval algebra. Conventional interval trees are routinely used to identify the genomic intervals satisfying a coarse relation with a query interval, but cannot support efficient query for more refined relations such as all Allen's relations. We design and implement a novel approach to address this unmet need. Through rewriting Allen's interval relations, we transform an interval query to a range query, then adapt and utilize the range trees for querying. We implement two types of range trees: a basic 2-dimensional range tree (2D-RT) and an augmented range tree with fractional cascading (RTFC) and compare them with the conventional interval tree (IT). Theoretical analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Biomedical Text Mining and Ontologies
