Subpath Queries on Compressed Graphs: a Survey

Nicola Prezza

arXiv:2011.10008·cs.DS·December 15, 2020

Subpath Queries on Compressed Graphs: a Survey

Nicola Prezza

PDF

TL;DR

This survey reviews the evolution of text indexing from suffix trees to advanced compressed indexes for labeled graphs, highlighting their impact on bioinformatics and regular language processing.

Contribution

It provides a comprehensive overview of the development of compressed graph indexes and their applications in indexing regular languages and complex data structures.

Findings

01

Compressed indexes enable efficient pattern matching in large texts.

02

Recent advances extend indexing techniques to labeled graphs and regular languages.

03

These developments have significant implications for bioinformatics and automata theory.

Abstract

Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text $T$ , pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in $T$ in time proportional to the query's length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text's entropy. These contributions had an enormous impact in bioinformatics: nowadays, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.