Space-efficient K-MER algorithm for generalized suffix tree

Freeson Kaniwa; Venu Madhav Kuthadi; Otlhapile Dinakenyane; Heiko; Schroeder

arXiv:1703.02224·cs.DS·March 8, 2017·1 cites

Space-efficient K-MER algorithm for generalized suffix tree

Freeson Kaniwa, Venu Madhav Kuthadi, Otlhapile Dinakenyane, Heiko, Schroeder

PDF

Open Access

TL;DR

This paper introduces a space-efficient generalized suffix tree algorithm that significantly reduces memory usage, enabling faster pattern searching on large datasets like chromosomes and text corpora.

Contribution

The paper presents a novel memory-efficient generalized suffix tree algorithm that reduces space requirements by a factor of 10 when pattern size is known beforehand.

Findings

01

Reduces suffix tree memory usage by a factor of 10

02

Demonstrates significant memory savings on biological and text datasets

03

Maintains linear time complexity for pattern searching

Abstract

Suffix trees have emerged to be very fast for pattern searching yielding O (m) time, where m is the pattern size. Unfortunately their high memory requirements make it impractical to work with huge amounts of data. We present a memory efficient algorithm of a generalized suffix tree which reduces the space size by a factor of 10 when the size of the pattern is known beforehand. Experiments on the chromosomes and Pizza&Chili corpus show significant advantages of our algorithm over standard linear time suffix tree construction in terms of memory usage for pattern searching.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Advanced Image and Video Retrieval Techniques