ERA Revisited: Theoretical and Experimental Evaluation

Matev\v{z} Jekovec; Andrej Brodnik

arXiv:1609.09654·cs.DC·October 3, 2016·1 cites

ERA Revisited: Theoretical and Experimental Evaluation

Matev\v{z} Jekovec, Andrej Brodnik

PDF

Open Access

TL;DR

This paper provides a comprehensive analysis of the ERA suffix tree construction algorithm, combining theoretical bounds, empirical validation, and insights into input characteristics affecting performance, serving as a foundation for future parallel text indexing research.

Contribution

It offers the first theoretical analysis of ERA under the PEM model, empirically validates the analysis, and discusses input conditions where the algorithm underperforms.

Findings

01

Theoretical bounds align with empirical results.

02

Critical input characteristics influence algorithm performance.

03

ERA is the fastest practical suffix tree construction algorithm.

Abstract

Efficient construction of the suffix tree given an input text is an active area of research from the time it was first introduced. Both theoretical computer scientists and engineers tackled the problem. In this paper we focus on the fastest practical suffix tree construction algorithm to date, ERA. We first provide a theoretical analysis of the algorithm assuming the uniformly random text as an input and using the PEM model of computation with respect to the lower bounds. Secondly, we empirically confirm the theoretical results in different test scenarios exposing the critical terms. Thirdly, we discuss the fundamental characteristics of the input text where the fastest suffix tree construction algorithms in practice fail. This paper serves as a foundation for further research in the parallel text indexing area.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · DNA and Biological Computing