Space-efficient K-MER algorithm for generalized suffix tree
Freeson Kaniwa, Venu Madhav Kuthadi, Otlhapile Dinakenyane, Heiko, Schroeder

TL;DR
This paper introduces a space-efficient generalized suffix tree algorithm that significantly reduces memory usage, enabling faster pattern searching on large datasets like chromosomes and text corpora.
Contribution
The paper presents a novel memory-efficient generalized suffix tree algorithm that reduces space requirements by a factor of 10 when pattern size is known beforehand.
Findings
Reduces suffix tree memory usage by a factor of 10
Demonstrates significant memory savings on biological and text datasets
Maintains linear time complexity for pattern searching
Abstract
Suffix trees have emerged to be very fast for pattern searching yielding O (m) time, where m is the pattern size. Unfortunately their high memory requirements make it impractical to work with huge amounts of data. We present a memory efficient algorithm of a generalized suffix tree which reduces the space size by a factor of 10 when the size of the pattern is known beforehand. Experiments on the chromosomes and Pizza&Chili corpus show significant advantages of our algorithm over standard linear time suffix tree construction in terms of memory usage for pattern searching.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Advanced Image and Video Retrieval Techniques
