Practical Random Access to SLP-Compressed Texts

Travis Gagie; Tomohiro I; Giovanni Manzini; Gonzalo Navarro; Hiroshi; Sakamoto; Louisa Seelbach Benkner; Yoshimasa Takabatake

arXiv:1910.07145·cs.DS·July 21, 2020

Practical Random Access to SLP-Compressed Texts

Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi, Sakamoto, Louisa Seelbach Benkner, Yoshimasa Takabatake

PDF

1 Repo

TL;DR

This paper improves random access efficiency in grammar-based compressed texts, making it practical for large datasets like genomic databases by introducing a new encoding that offers faster queries without increasing size.

Contribution

It presents a new grammar encoding that achieves faster random access queries while maintaining a size comparable to the state of the art.

Findings

01

Faster random access queries compared to previous methods

02

Comparable compression size to existing practical approaches

03

Enhanced applicability to large-scale datasets like genomics

Abstract

Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our attention to one of the features that make grammar-based compression so attractive: the possibility of supporting fast random access. This is an essential primitive in many algorithms that process grammar-compressed texts without decompressing them and so many theoretical bounds have been published about it, but experimentation has lagged behind. We give a new encoding of grammars that is about as small as the practical state of the art (Maruyama et al., SPIRE 2013) but with significantly faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

itomomoti/ShapedSlp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.