Compressed Spaced Suffix Arrays

Travis Gagie; Giovanni Manzini; Daniel Valenzuela

arXiv:1312.3422·cs.DS·March 11, 2014·1 cites

Compressed Spaced Suffix Arrays

Travis Gagie, Giovanni Manzini, Daniel Valenzuela

PDF

Open Access

TL;DR

This paper introduces a method to compress spaced suffix arrays (SSAs) relative to standard suffix arrays, enabling efficient similarity searches in bioinformatics with reduced space requirements.

Contribution

It provides a theoretical bound and practical approach for compressing SSAs while maintaining fast access, improving storage efficiency over existing methods.

Findings

01

Theoretical upper bound on SSA compression space.

02

Practical experiments show effective compression in real data.

03

Maintains fast random access despite compression.

Abstract

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data structure, either a hash table or a spaced suffix array (SSA). In this paper we show how to compress SSAs relative to normal suffix arrays (SAs) and still support fast random access to them. We first prove a theoretical upper bound on the space needed to store an SSA when we already have the SA. We then present experiments indicating that our approach works even better in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Advanced Data Compression Techniques