TL;DR
This paper introduces a new method for compressing suffix trees of repetitive sequences by relative compression against a reference, achieving a balance of small size and efficient performance.
Contribution
It proposes a novel approach to relative suffix tree compression tailored for repetitive sequence collections, improving space efficiency while maintaining competitive speed.
Findings
Achieves near-minimal size comparable to the smallest compressed suffix trees.
Maintains competitive query times with the fastest compressed suffix trees.
Effective for collections of similar sequences like genomes.
Abstract
Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while using space proportional to a compressed representation of the sequence. In this work, we take a new approach to compressed suffix trees for repetitive sequence collections, such as collections of individual genomes. We compress the suffix trees of individual sequences relative to the suffix tree of a reference sequence. These relative data structures provide competitive time/space trade-offs, being almost as small as the smallest compressed suffix trees for repetitive collections, and competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
